Quantifying and Propagating Uncertainty in Automated Linked Data Integration

Klitos Christodoulou, Fernando Rene Sanchez Serrano, Alvaro A. A. Fernandes, Norman W. Paton

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

100 Downloads (Pure)


The Web of Data consists of numerous Linked Data (LD) sources from many largely independent publishers, giving rise to the need for data integration at scale. To address data integration at scale, automation can provide candidate integrations that underpin a pay-as-you-go approach. However, automated approaches need: (i) to operate across several data integration steps; (ii) to build on diverse sources of evidence; and (iii) to contend with uncertainty. This paper describes the construction of probabilistic models that yield degrees of belief both on the equivalence of real-world concepts, and on the ability of mapping expressions to return correct results. The paper shows how such models can underpin a Bayesian approach to assimilating different forms of evidence: syntactic (in the form of similarity scores derived by string-based matchers), semantic (in the form of semantic annotations stemming from LD vocabularies), and internal in the form of fitness values for candidate mappings. The paper presents an empirical evaluation of the methodology described with respect to equivalence and correctness judgements made by human experts. Experimental evaluation confirms that the proposed Bayesian methodology is suitable as a generic, principled approach for quantifying and assimilating different pieces of evidence throughout the various phases of an automated data integration process.
Original languageEnglish
Title of host publicationTransactions on Large-Scale Data- and Knowledge-Centered Systems XXXVII
Number of pages31
Publication statusPublished - 2018

Publication series

NameLecture Notes in Computer Science


Dive into the research topics of 'Quantifying and Propagating Uncertainty in Automated Linked Data Integration'. Together they form a unique fingerprint.

Cite this