Learning to extract relations for protein annotation

Jee Hyub Kim, Alex Mitchell, Teresa K. Attwood, Melanie Hilario

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Motivation: Protein annotation is a task that describes protein X in terms of topic Y. Usually, this is constructed using information from the biomedical literature. Until now, most of literature-based protein annotation work has been done manually by human annotators. However, as the number of biomedical papers grows ever more rapidly, manual annotation becomes more difficult, and there is increasing need to automate the process. Recently, information extraction (IE) has been used to address this problem. Typically, IE requires pre-defined relations and hand-crafted IE rules or annotated corpora, and these requirements are difficult to satisfy in real-world scenarios such as in the biomedical domain. In this article, we describe an IE system that requires only sentences labelled according to their relevance or not to a given topic by domain experts. Results: We applied our system to meet the annotation needs of a well-known protein family database; the results show that our IE system can annotate proteins with a set of extracted relations by learning relations and IE rules for disease, function and structure from only relevant and irrelevant sentences. © 2007 The Author(s).
    Original languageEnglish
    Pages (from-to)256-263
    Number of pages7
    JournalBioinformatics
    Volume23
    Issue number13
    DOIs
    Publication statusPublished - 1 Jul 2007

    Fingerprint

    Dive into the research topics of 'Learning to extract relations for protein annotation'. Together they form a unique fingerprint.

    Cite this