Semantic reranking of CRF label sequences for verbal multiword expression identification

Erwan Moreau, Ashjan Alsulaimani, Alfredo Maldonado, Lifeng Han, Carl Vogel, Koel Dutta Chowdhury

Research output: Chapter in Book/Conference proceedingChapterpeer-review

Abstract

Verbal multiword Expressions (VMWE) identification can be addressed successfully as a sequence labelling problem via conditional random fields (CRFs) by returning the one label sequence with maximal probability. This work describes a system that reranks the top 10 most likely CRF candidate VMWE sequences using a decision tree regression model. The reranker aims to operationalise the intuition that a non-compositional MWE can have a different distributional behaviour than that of its constituent words. This is why it uses semantic features based on comparing the context vector of a candidate expression against those of its constituent words. However, not all VMWE are non-compostional, and analysis shows that non-semantic features also play an important role in the behaviour of the reranker. In fact, the analysis shows that the combination of the sequential approach of the CRF component with the context-based approach of the reranker is the main factor of improvement: our reranker achieves a 12% macro-average F1-score improvement on the basic CRF method, as measured using data from PARSEME shared task on VMWE identification.
Original languageEnglish
Title of host publicationMultiword expressions at length and in depth
Subtitle of host publicationExtended papers from the MWE 2017 workshop
EditorsStella Markantonatou , Carlos Ramisch, Agata Savary, Veronika Vincze
Place of PublicationBerlin
PublisherLanguage Science Press
Chapter6
Pages177-207
Number of pages31
ISBN (Electronic)9783961101238
DOIs
Publication statusPublished - Oct 2018

Fingerprint

Dive into the research topics of 'Semantic reranking of CRF label sequences for verbal multiword expression identification'. Together they form a unique fingerprint.

Cite this