Joint Arabic segmentation and part-of-speech tagging

John McNaught, Shabib AlGahtani

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    Abstract

    Arabic has a very complex morphological system, though a very structured one. Character patterns are often indicative of word class and word segmentation. In this paper, we explore a novel approach to Arabic word segmentation and part-of-speech tagging relying on character information. The approach is lexicon-free and does not require any morphological analysis, eliminating the factor of dictionary coverage. Using character-based analysis, the developed system yielded state-of-the-art accuracy comparing favourably with other taggers that involve external re-sources.
    Original languageEnglish
    Title of host publicationProceedings of the Second Workshop on Arabic Natural Language Processing
    EditorsNizar Habash, Stephan Vogel, Kareem Darwish
    Place of PublicationStroudsburg, USA
    PublisherAssociation for Computational Linguistics
    Pages108-117
    Number of pages10
    ISBN (Print)978-1-941643-58-7
    Publication statusPublished - 30 Jul 2015
    EventACL Second Workshop on Arabic Natural Language Processing (WANLP 2015) - Beijing, China
    Duration: 30 Jul 201530 Jul 2015
    http://https://aclweb.org/anthology/W/W15/W15-3212.pdf

    Conference

    ConferenceACL Second Workshop on Arabic Natural Language Processing (WANLP 2015)
    CityBeijing, China
    Period30/07/1530/07/15
    Internet address

    Keywords

    • part-of-speech tagging
    • natural language processing
    • computational linguistics
    • Arabic
    • Arabic Treebank
    • corpus linguistics

    Fingerprint

    Dive into the research topics of 'Joint Arabic segmentation and part-of-speech tagging'. Together they form a unique fingerprint.

    Cite this