The Value of an In-Domain Lexicon in Genomics QA

S Ananiadou, Y. Sasaki, J. McNaught

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    This paper demonstrates that a large-scale lexicon tailored for the biology domain is effective in improving question analysis for genomics Question Answering (QA). We use the TREC Genomics Track data to evaluate the performance of different question analysis methods. It is hard to process textual information in biology,especially in molecular biology, due to a huge number of technical terms which rarely appear in general English documents and dictionaries.To support biological Text Mining, we have developed a domain-specific resource,the BioLexicon. Started in 2006 from scratch, this lexicon currently includes more than four million biomedical terms consisting of newly curated terms and terms collected from existing biomedical databases. While conventional genomics IR/QA systems provide query expansion based on thesauri and dictionaries, it is notclear to what extent a biology-oriented lexical resource is effective for question pre-processing for genomics QA. Experiments on the genomics QA data set show that question analysis using the BioLexicon performs slightly better thanthat using n-grams and the UMLS Specialist Lexicon.
    Original languageEnglish
    Title of host publicationhost publication
    Pages47-55
    Number of pages9
    Publication statusPublished - 2009
    Event3rd International Symposium on Languages in Biology and Medicine (LBM-2009) -
    Duration: 1 Jan 1824 → …

    Conference

    Conference3rd International Symposium on Languages in Biology and Medicine (LBM-2009)
    Period1/01/24 → …

    Fingerprint

    Dive into the research topics of 'The Value of an In-Domain Lexicon in Genomics QA'. Together they form a unique fingerprint.

    Cite this