Developing a robust part-of-speech tagger for biomedical text

Yoshimasa Tsuruoka, Yuka Tateishi, Jin Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, Jun'ichi Tsujii

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    Abstract

    This paper presents a part-of-speech tagger which is specifically tuned for biomedical text. We have built the tagger with maximum entropy modeling and a state-of-the-art tagging algorithm. The tagger was trained on a corpus containing newspaper articles said biomedical documents so that it would work well on various types of biomedical text. Experimental results on the Wall Street Journal corpus, the GENIA corpus, and the PennBioIE corpus revealed that adding training data from a different domain does not hurt the performance of a tagger, and our tagger exhibits very good precision (97% to 98%) on all these corpora. We also evaluated the robustness of the tagger using recent MEDLINE articles. © Springer-Verlag Berlin Heidelberg 2005.
    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci.
    Pages382-392
    Number of pages10
    Volume3746
    DOIs
    Publication statusPublished - 2005
    Event10th Panhellenic Conference on Informatics, PCI 2005 - Volos
    Duration: 1 Jul 2005 → …

    Publication series

    NameAdvances in Informatics - 10th Panhellenic Conference on Informatics

    Other

    Other10th Panhellenic Conference on Informatics, PCI 2005
    CityVolos
    Period1/07/05 → …

    Fingerprint

    Dive into the research topics of 'Developing a robust part-of-speech tagger for biomedical text'. Together they form a unique fingerprint.

    Cite this