Abstract
Data wrangling is the process by which the data required by an application
is identified, extracted, cleaned and integrated, to yield a
data set that is suitable for exploration and analysis. Although there
are widely used Extract, Transform and Load (ETL) techniques and
platforms, they often require manual work from technical and domain
experts at different stages of the process. When confronted
with the 4 V’s of big data (volume, velocity, variety and veracity),
manual intervention may make ETL prohibitively expensive. This
paper argues that providing cost-effective, highly-automated approaches
to data wrangling involves significant research challenges,
requiring fundamental changes to established areas such as data extraction,
integration and cleaning, and to the ways in which these
areas are brought together. Specifically, the paper discusses the importance
of comprehensive support for context awareness within
data wrangling, and the need for adaptive, pay-as-you-go solutions
that automatically tune the wrangling process to the requirements
and resources of the specific application.
is identified, extracted, cleaned and integrated, to yield a
data set that is suitable for exploration and analysis. Although there
are widely used Extract, Transform and Load (ETL) techniques and
platforms, they often require manual work from technical and domain
experts at different stages of the process. When confronted
with the 4 V’s of big data (volume, velocity, variety and veracity),
manual intervention may make ETL prohibitively expensive. This
paper argues that providing cost-effective, highly-automated approaches
to data wrangling involves significant research challenges,
requiring fundamental changes to established areas such as data extraction,
integration and cleaning, and to the ways in which these
areas are brought together. Specifically, the paper discusses the importance
of comprehensive support for context awareness within
data wrangling, and the need for adaptive, pay-as-you-go solutions
that automatically tune the wrangling process to the requirements
and resources of the specific application.
Original language | English |
---|---|
Title of host publication | Advances in Database Technology — EDBT 2016 |
Subtitle of host publication | Proceedings of the 19th International Conference on Extending Database Technology |
Pages | 473-478 |
DOIs | |
Publication status | Published - 1 Nov 2016 |