Data Wrangling for Big Data: Challenges and Opportunities

Tim Furche, George Gottlob, Leonid Libkin, Giorgio Orsi, Norman Paton

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    1294 Downloads (Pure)

    Abstract

    Data wrangling is the process by which the data required by an application
    is identified, extracted, cleaned and integrated, to yield a
    data set that is suitable for exploration and analysis. Although there
    are widely used Extract, Transform and Load (ETL) techniques and
    platforms, they often require manual work from technical and domain
    experts at different stages of the process. When confronted
    with the 4 V’s of big data (volume, velocity, variety and veracity),
    manual intervention may make ETL prohibitively expensive. This
    paper argues that providing cost-effective, highly-automated approaches
    to data wrangling involves significant research challenges,
    requiring fundamental changes to established areas such as data extraction,
    integration and cleaning, and to the ways in which these
    areas are brought together. Specifically, the paper discusses the importance
    of comprehensive support for context awareness within
    data wrangling, and the need for adaptive, pay-as-you-go solutions
    that automatically tune the wrangling process to the requirements
    and resources of the specific application.
    Original languageEnglish
    Title of host publicationAdvances in Database Technology — EDBT 2016
    Subtitle of host publicationProceedings of the 19th International Conference on Extending Database Technology
    Pages473-478
    DOIs
    Publication statusPublished - 1 Nov 2016

    Fingerprint

    Dive into the research topics of 'Data Wrangling for Big Data: Challenges and Opportunities'. Together they form a unique fingerprint.

    Cite this