Abstract
Obtaining value from data through analysis often requires significant
prior effort on data preparation. Data preparation covers the discovery, selection, integration and cleaning of existing data sets into a form that is suitable for analysis. Data preparation, also known as data wrangling or extract transform load, is reported as taking 80% of the time of data scientists. How can this time be reduced? Can it be reduced by automation? There have been significant results on the automation of individual steps within the data wrangling process, and there are now a few proposals for end-to-end automation. This paper reviews the state-of-the-art, and asks the following questions: Can we automate data preparation– what techniques are already available? Should we – what data preparation activities seem likely to be able to be carried out better by software than by human experts? Must we – what data preparation challenges cannot realistically be carried out by manual approaches?
prior effort on data preparation. Data preparation covers the discovery, selection, integration and cleaning of existing data sets into a form that is suitable for analysis. Data preparation, also known as data wrangling or extract transform load, is reported as taking 80% of the time of data scientists. How can this time be reduced? Can it be reduced by automation? There have been significant results on the automation of individual steps within the data wrangling process, and there are now a few proposals for end-to-end automation. This paper reviews the state-of-the-art, and asks the following questions: Can we automate data preparation– what techniques are already available? Should we – what data preparation activities seem likely to be able to be carried out better by software than by human experts? Must we – what data preparation challenges cannot realistically be carried out by manual approaches?
Original language | English |
---|---|
Title of host publication | Proceedings of the 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data |
Publication status | Published - 2019 |
Event | 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data - Lisbon, Portugal Duration: 26 Mar 2019 → … |
Conference
Conference | 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data |
---|---|
Abbreviated title | DOLAP 2019 |
Country/Territory | Portugal |
City | Lisbon |
Period | 26/03/19 → … |