TY - JOUR
T1 - Semantic data integration for Francisella tularensis novicida proteomic and genomic data
AU - Anwar, Nadia
AU - Hunt, Ela
AU - Kolch, Walter
AU - Pitt, Andrew
PY - 2009
Y1 - 2009
N2 - This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, using multiple technologies, perform experiments to understand the mechanism of virulence. It is hard to integrate such data, and this work examines the role of explicitly provided data semantics in data integration. We test whether the semantic web technologies could be used to reveal previously unknown connections across the available Fn datasets. We combined this data with genome data and with public domain annotations within GO, KEGG and the SUPERFAMILY database. Through this connected graph of database cross references, we extended the annotations of an experimental data set by superimposing onto it the annotation graph. Identifiers used in the experimental data automatically resolved and the data acquired annotations in the rest of the RDF graph. This happened without the expensive manual annotation that would normally be required to produce these links. Other lessons learnt and future challenges that result from this work are also presented in detail.
AB - This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, using multiple technologies, perform experiments to understand the mechanism of virulence. It is hard to integrate such data, and this work examines the role of explicitly provided data semantics in data integration. We test whether the semantic web technologies could be used to reveal previously unknown connections across the available Fn datasets. We combined this data with genome data and with public domain annotations within GO, KEGG and the SUPERFAMILY database. Through this connected graph of database cross references, we extended the annotations of an experimental data set by superimposing onto it the annotation graph. Identifiers used in the experimental data automatically resolved and the data acquired annotations in the rest of the RDF graph. This happened without the expensive manual annotation that would normally be required to produce these links. Other lessons learnt and future challenges that result from this work are also presented in detail.
UR - http://www.scopus.com/inward/record.url?scp=84885805192&partnerID=8YFLogxK
UR - https://ceur-ws.org/Vol-435/
U2 - https://ceur-ws.org/Vol-435/paper05.pdf
DO - https://ceur-ws.org/Vol-435/paper05.pdf
M3 - Conference article
AN - SCOPUS:84885805192
SN - 1613-0073
VL - 435
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2008
Y2 - 28 November 2008 through 28 November 2008
ER -