TY - GEN
T1 - The VADA Architecture for Cost-Effective Data Wrangling
AU - Konstantinou, Nikolaos
AU - Koehler, Martin
AU - Abel, Edward
AU - Civili, Cristina
AU - Neumayr, Bernd
AU - Sallinger, Emanuel
AU - Fernandes, Alvaro A. A.
AU - Gottlob, Georg
AU - Keane, John
AU - Libkin, Leonid
AU - Paton, Norman
PY - 2017/5
Y1 - 2017/5
N2 - Data wrangling, the multi-faceted process by which the data required by an application is identi_ed, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is in- formed by whatever data is available, re_nes automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with di- verse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.
AB - Data wrangling, the multi-faceted process by which the data required by an application is identi_ed, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is in- formed by whatever data is available, re_nes automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with di- verse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.
KW - Data Wrangling
U2 - 10.1145/3035918.3058730
DO - 10.1145/3035918.3058730
M3 - Conference contribution
BT - ACM SIGMOD
ER -