Extracting Format Transformation Examples from Manual Data Corrections

Nurzety Binti Ahmad Azuan, Suzanne Embury, Norman Paton

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


One of the challenges in data analysis is the substantial cost of human
involvement. Before any analysis can take place, data from heterogeneous sources needs to be cleaned, integrated and transformed into a uniform format. This tasks, also known as `'data wrangling" often requires both technical skills and knowledge from domain experts. Because effort performed during data wrangling, including format transformation, is usually task-dependent and often tailored to specific sources, it gives rise to a repetitive, time-consuming and labour intensive process. Current tools support data scientists in conducting wrangling steps, such as the creation of format transformation rules, but the problem of iterative manual work to inform the creation of such rules remains. We propose an approach that observes the actions of data scientists at work
correcting errors in a query result. Specifically, we aim to extract format
transformation examples from manual corrections carried out by data scientists, that can be used to synthesize format transformation programs. In so doing, the objective is to re-use information about recurring manual corrections to automate subsequent transformations. In this paper, we propose example generation and filtering techniques for extracting format transformation examples from manual corrections, and evaluate the techniques empirically on a variety of format transformation tasks.
Original languageEnglish
Title of host publicationNew Trends in Databases and Information Systems
Subtitle of host publicationADBIS 2018 Short Papers and Workshops, AI*QA, BIGPMED, CSACDB, M2U, BigDataMAPS, ISTREND, DC, Budapest, Hungary, September, 2-5, 2018, Proceedings
PublisherSpringer Nature
Number of pages8
ISBN (Electronic)978-3-030-00063-9
ISBN (Print)978-3-030-00062-2
Publication statusPublished - 2018


  • Data wrangling
  • format transformation
  • data integration
  • user feedback
  • implicit feedback


Dive into the research topics of 'Extracting Format Transformation Examples from Manual Data Corrections'. Together they form a unique fingerprint.

Cite this