Data Context Informed Data Wrangling

Martin Koehler, Alex Bogatu, Cristina Civili, Nikolaos Konstantinou, Edward Abel, Alvaro Fernandes, John Keane, Leonid Libkin, Norman Paton

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

163 Downloads (Pure)

Abstract

The process of preparing potentially large and complex data sets for further analysis or manual examination is often called data wrangling. In classical warehousing environments, the steps in such a process have been carried
out using Extract-Transform-Load platforms, with significant manual involvement in specifying, configuring or tuning many of them. Cost-effective data wrangling processes need to ensure that data wrangling steps benefit from automation wherever possible. In this paper, we define a methodology to fully
automate an end-to-end data wrangling process incorporating data context, which associates portions of a target schema with potentially spurious extensional data of types that are commonly available. Instance-based evidence together with data profiling paves the way to inform automation in several steps
within the wrangling process, specifically, matching, mapping validation, value format transformation, and data repair. The approach is evaluated with real estate data showing substantial improvements in the results of automated wrangling.
Original languageEnglish
Title of host publication2017 IEEE International Conference on Big Data (Big Data)
PublisherIEEE
Pages956-963
Number of pages8
ISBN (Electronic)978-1-5386-2715-0
ISBN (Print)978-1-5386-2716-7
DOIs
Publication statusPublished - 15 Jan 2018
Event2017 IEEE International Conference on Big Data (Big Data) - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Conference

Conference2017 IEEE International Conference on Big Data (Big Data)
Country/TerritoryUnited States
CityBoston
Period11/12/1714/12/17

Keywords

  • Data Context
  • Data Integration
  • Data Wrangling

Fingerprint

Dive into the research topics of 'Data Context Informed Data Wrangling'. Together they form a unique fingerprint.

Cite this