Towards Automatic Data Format Transformations: Data Wrangling at Scale

Alex Bogatu, Norman Paton, Alvaro Fernandes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data wrangling is the process whereby data is cleaned and
integrated for analysis. Data wrangling, even with tool support, is typically
a labour intensive process. One aspect of data wrangling involves
carrying out format transformations on attribute values, for example
so that names or phone numbers are represented consistently. Recent
research has developed techniques for synthesising format transformation
programs from examples of the source and target representations.
This is valuable, but still requires a user to provide suitable examples,
something that may be challenging in applications in which there are
huge data sets or numerous data sources. In this paper we investigate
the automatic discovery of examples that can be used to synthesise format
transformation programs. In particular, we propose an approach to
identifying candidate data examples and validating the transformations
that are synthesised from them. The approach is evaluated empirically
using data sets from open government data.
Original languageEnglish
Title of host publicationData Analytics - 31st British International Conference on Databases, BICOD 2017, London, UK, July 10-12, 2017, Proceedings.
EditorsAndrea Cali, Peter Wood, Nigel Martin, Alexandra Poulovassilis
PublisherSpringer Nature
Pages36-48
Number of pages13
ISBN (Print)978-3-319-60794-8
DOIs
Publication statusPublished - 2017

Publication series

NameLecture Notes in Computer Science
Volume10375

Fingerprint

Dive into the research topics of 'Towards Automatic Data Format Transformations: Data Wrangling at Scale'. Together they form a unique fingerprint.

Cite this