Automating Data Preparation: Can We? Should We? Must We?

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

750 Downloads (Pure)

Abstract

Obtaining value from data through analysis often requires significant
prior effort on data preparation. Data preparation covers the discovery, selection, integration and cleaning of existing data sets into a form that is suitable for analysis. Data preparation, also known as data wrangling or extract transform load, is reported as taking 80% of the time of data scientists. How can this time be reduced? Can it be reduced by automation? There have been significant results on the automation of individual steps within the data wrangling process, and there are now a few proposals for end-to-end automation. This paper reviews the state-of-the-art, and asks the following questions: Can we automate data preparation– what techniques are already available? Should we – what data preparation activities seem likely to be able to be carried out better by software than by human experts? Must we – what data preparation challenges cannot realistically be carried out by manual approaches?
Original languageEnglish
Title of host publicationProceedings of the 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
Publication statusPublished - 2019
Event21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data - Lisbon, Portugal
Duration: 26 Mar 2019 → …

Conference

Conference21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
Abbreviated titleDOLAP 2019
Country/TerritoryPortugal
CityLisbon
Period26/03/19 → …

Fingerprint

Dive into the research topics of 'Automating Data Preparation: Can We? Should We? Must We?'. Together they form a unique fingerprint.

Cite this