TY - JOUR
T1 - Automatic extraction of microorganisms and their habitats from free text using text mining workflows.
AU - Kolluru, Balakrishna
AU - Nakjang, Sirintra
AU - Hirt, Robert P.
AU - Wipat, Anil
AU - Ananiadou, Sophia
PY - 2011
Y1 - 2011
N2 - In this paper we illustrate the usage of text mining workflows to automatically extract instances of microorganisms and their habitats from free text; these entries can then be curated and added to different databases. To this end, we use a Conditional Random Field (CRF) based classifier, as part of the workflows, to extract the mention of microorganisms, habitats and the inter-relation between organisms and their habitats. Results indicate a good performance for extraction of microorganisms and the relation extraction aspects of the task (with a precision of over 80%), while habitat recognition is only moderate (a precision of about 65%). We also conjecture that pdf-to-text conversion can be quite noisy and this implicitly affects any sentence-based relation extraction algorithms.
AB - In this paper we illustrate the usage of text mining workflows to automatically extract instances of microorganisms and their habitats from free text; these entries can then be curated and added to different databases. To this end, we use a Conditional Random Field (CRF) based classifier, as part of the workflows, to extract the mention of microorganisms, habitats and the inter-relation between organisms and their habitats. Results indicate a good performance for extraction of microorganisms and the relation extraction aspects of the task (with a precision of over 80%), while habitat recognition is only moderate (a precision of about 65%). We also conjecture that pdf-to-text conversion can be quite noisy and this implicitly affects any sentence-based relation extraction algorithms.
M3 - Article
SN - 1613-4516
VL - 8
SP - 184
JO - Journal of integrative bioinformatics
JF - Journal of integrative bioinformatics
IS - 2
ER -