A Universal Phrase Tagset for Multilingual Treebanks

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


Many syntactic treebanks and parser toolkits are developed in the past twenty years, including dependency structure parsers and phrase structure parsers. For the phrase structure parsers, they usually utilize different phrase tagsets for different languages, which results in an inconvenience when conducting the multilingual research. This paper designs a refined universal phrase tagset that contains 9 commonly used phrase categories. Furthermore, the mapping covers 25 constituent treebanks and 21 languages. The experiments show that the universal phrase tagset can generally reduce the costs in the parsing models and even improve the parsing accuracy.
Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Subtitle of host publication13th China National Conference, CCL 2014, and First International Symposium, NLP-NABD 2014, Wuhan, China, October 18-19, 2014. Proceedings
EditorsMaosong Sun, Yang Liu, Jun Zhao
Place of PublicationCham
PublisherSpringer Cham
Number of pages12
ISBN (Electronic)9783319122779
ISBN (Print)9783319122762
Publication statusPublished - 24 Sept 2014

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


  • Chinese word segmentation
  • information retrieval
  • machine translation
  • natural language understanding
  • text mining


Dive into the research topics of 'A Universal Phrase Tagset for Multilingual Treebanks'. Together they form a unique fingerprint.

Cite this