Semantic data integration for Francisella tularensis novicida proteomic and genomic data

Nadia Anwar, Ela Hunt, Walter Kolch, Andrew Pitt

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, using multiple technologies, perform experiments to understand the mechanism of virulence. It is hard to integrate such data, and this work examines the role of explicitly provided data semantics in data integration. We test whether the semantic web technologies could be used to reveal previously unknown connections across the available Fn datasets. We combined this data with genome data and with public domain annotations within GO, KEGG and the SUPERFAMILY database. Through this connected graph of database cross references, we extended the annotations of an experimental data set by superimposing onto it the annotation graph. Identifiers used in the experimental data automatically resolved and the data acquired annotations in the rest of the RDF graph. This happened without the expensive manual annotation that would normally be required to produce these links. Other lessons learnt and future challenges that result from this work are also presented in detail.

Original languageEnglish
Number of pages17
JournalCEUR Workshop Proceedings
Volume435
DOIs
Publication statusPublished - 2009
EventWorkshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2008 - Edinburgh, United Kingdom
Duration: 28 Nov 200828 Nov 2008

Fingerprint

Dive into the research topics of 'Semantic data integration for Francisella tularensis novicida proteomic and genomic data'. Together they form a unique fingerprint.

Cite this