The VADA Architecture for Cost-Effective Data Wrangling

Nikolaos Konstantinou, Martin Koehler, Edward Abel, Cristina Civili, Bernd Neumayr, Emanuel Sallinger, Alvaro A. A. Fernandes, Georg Gottlob, John Keane, Leonid Libkin, Norman Paton

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data wrangling, the multi-faceted process by which the data required by an application is identi_ed, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is in- formed by whatever data is available, re_nes automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with di- verse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.
Original languageEnglish
Title of host publicationACM SIGMOD
DOIs
Publication statusPublished - May 2017

Keywords

  • Data Wrangling

Fingerprint

Dive into the research topics of 'The VADA Architecture for Cost-Effective Data Wrangling'. Together they form a unique fingerprint.

Cite this