Automatic extraction of microorganisms and their habitats from free text using text mining workflows.

Balakrishna Kolluru, Sirintra Nakjang, Robert P. Hirt, Anil Wipat, Sophia Ananiadou

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    Abstract

    In this paper we illustrate the usage of text mining workflows to automatically extract instances of microorganisms and their habitats from free text; these entries can then be curated and added to different databases. To this end, we use a Conditional Random Field (CRF) based classifier, as part of the workflows, to extract the mention of microorganisms, habitats and the inter-relation between organisms and their habitats. Results indicate a good performance for extraction of microorganisms and the relation extraction aspects of the task (with a precision of over 80%), while habitat recognition is only moderate (a precision of about 65%). We also conjecture that pdf-to-text conversion can be quite noisy and this implicitly affects any sentence-based relation extraction algorithms.
    Original languageEnglish
    Title of host publicationJournal of integrative bioinformatics|J Integr Bioinform
    Pages184
    Volume8
    Publication statusPublished - 2011
    EventInternational Symposium on Integrative Bioinformatics -
    Duration: 1 Jan 1824 → …

    Conference

    ConferenceInternational Symposium on Integrative Bioinformatics
    Period1/01/24 → …

    Fingerprint

    Dive into the research topics of 'Automatic extraction of microorganisms and their habitats from free text using text mining workflows.'. Together they form a unique fingerprint.

    Cite this