TY - JOUR
T1 - Processing Biological Literature with Customizable Web Services Supporting Interoperable Formats
AU - Rak, Rafal
AU - Batista-Navarro, Riza Theresa
AU - Carter, J
AU - Rowley, A
AU - Ananiadou, Sophia
N1 - The named entity recognition methods underpinning the platform described in this paper represent the system that consistently obtained best performance (out of 12 participants) in each of the gene, chemical and disease name recognition tasks of the highly visible BioCreative IV community evaluation, applied to the curation of the Comparative Toxicogenomics Database (CTD). The strength of the proposed approach is in mitigating the challenge posed by the lack of gold standard named entity-annotated corpora suitable for CTD curation. This was addressed through the use of distant supervision techniques to generate weakly labelled data. The contribution proved to be effective and indeed further solidified the University of Manchester as a leader in the international biomedical text mining community.
PY - 2014
Y1 - 2014
N2 - Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams.
AB - Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams.
U2 - 10.1093/database/bau064
DO - 10.1093/database/bau064
M3 - Article
SN - 1758-0463
VL - 2014
JO - Database
JF - Database
ER -