Abstract
Arabic has a very complex morphological system, though a very structured one. Character patterns are often indicative of word class and word segmentation. In this paper, we explore a novel approach to Arabic word segmentation and part-of-speech tagging relying on character information. The approach is lexicon-free and does not require any morphological analysis, eliminating the factor of dictionary coverage. Using character-based analysis, the developed system yielded state-of-the-art accuracy comparing favourably with other taggers that involve external re-sources.
Original language | English |
---|---|
Title of host publication | Proceedings of the Second Workshop on Arabic Natural Language Processing |
Editors | Nizar Habash, Stephan Vogel, Kareem Darwish |
Place of Publication | Stroudsburg, USA |
Publisher | Association for Computational Linguistics |
Pages | 108-117 |
Number of pages | 10 |
ISBN (Print) | 978-1-941643-58-7 |
Publication status | Published - 30 Jul 2015 |
Event | ACL Second Workshop on Arabic Natural Language Processing (WANLP 2015) - Beijing, China Duration: 30 Jul 2015 → 30 Jul 2015 http://https://aclweb.org/anthology/W/W15/W15-3212.pdf |
Conference
Conference | ACL Second Workshop on Arabic Natural Language Processing (WANLP 2015) |
---|---|
City | Beijing, China |
Period | 30/07/15 → 30/07/15 |
Internet address |
Keywords
- part-of-speech tagging
- natural language processing
- computational linguistics
- Arabic
- Arabic Treebank
- corpus linguistics