Cost–effective Variational Active Entity Resolution

Alex Bogatu, Norman Paton, Mark Douthwaite, Stuart Davie, Andre Freitas

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Accurately identifying different representations of the same real–world entity is an integral part of data cleaning and many methods have been proposed to accomplish it. The challenges of this entity resolution task that demand so much research attention are often rooted in the task–specificity and user–dependence of the process. Adopting deep learning techniques has the potential to lessen these challenges. In this paper, we set out to devise an entity resolution method that builds on the robustness conferred by deep autoencoders to reduce human–involvement
costs. Specifically, we reduce the cost of training deep entity resolution models by performing unsupervised representation learning. This unveils a transferability property of the resulting model that can further reduce the cost of applying the approach to new datasets by means of transfer learning. Finally, we reduce the cost of labeling training data through an active learning approach that builds on the properties conferred by the use of deep autoencoders. Empirical evaluation confirms the accomplishment of our cost–reduction desideratum, while achieving comparable
effectiveness with state–of–the–art alternatives.
Original languageEnglish
Title of host publication37th IEEE International Conference on Data Engineering
Publication statusAccepted/In press - 12 Feb 2021
Event37th IEEE International Conference on Data Engineering - Chania, Greece
Duration: 19 Apr 202122 Apr 2021

Conference

Conference37th IEEE International Conference on Data Engineering
Country/TerritoryGreece
CityChania
Period19/04/2122/04/21

Fingerprint

Dive into the research topics of 'Cost–effective Variational Active Entity Resolution'. Together they form a unique fingerprint.

Cite this