Probabilistic Approaches to Overcome Content Heterogeneity in Data Integration: A Study Case in Systematic Lupus Erythematosus

MASTERPLANS Consortium

Research output: Contribution to journalArticlepeer-review

Abstract

Integrating data from different sources into homogeneous dataset increases the opportunities to study human health. However, disparate data collections are often heterogeneous, which complicates their integration. In this paper, we focus on the issue of content heterogeneity in data integration. Traditional approaches for resolving content heterogeneity map all source datasets to a common data model that includes only shared data items, and thus omit all items that vary between datasets. Based on an example of three datasets in Systemic Lupus Erythematosus, we describe and experimentally evaluate a probabilistic data integration approach which propagates the uncertainty resulting from content heterogeneity into statistical inference, avoiding the need to map to a common data model.

Original languageEnglish
Pages (from-to)387-391
Number of pages5
JournalStudies in Health Technology and Informatics
Volume270
DOIs
Publication statusPublished - 16 Jun 2020

Keywords

  • Biomedical data harmonisation
  • Content heterogeneity
  • Missing data
  • Probabilistic data integration

Fingerprint

Dive into the research topics of 'Probabilistic Approaches to Overcome Content Heterogeneity in Data Integration: A Study Case in Systematic Lupus Erythematosus'. Together they form a unique fingerprint.

Cite this