Cracking Multi-Word Expressions in a nutshell: Neural MT walks on thin ice when facing MWEs

Activity: Talk or presentationInvited talkResearch


Invited talk to IMS-NLP Institute, University of Stuttgart, hosted by Ms. Prisca Piccirilli (PhD candidate) and Prof Sabine Schulte im Walde.
In this talk, I will present some work related to multi-word expressions (MWEs), including MWE identification and translation. The presentation covers 1) our participation on MWE identification back in MWE2017@EACL, 2) our neural machine translation (NMT) models addressing MWE translation from low-frequency terms perspective for EN-DE/ZH, which includes bilingual MWE terms extraction and augmentation to training data, as well as decomposing Chinese characters/symbols into lower level representations, and 3) our multilingual parallel corpus preparation using PARSEME English seed corpus that has verbalMWE annotations, into DE/ZH/PL and other ongoing languages, where we give many examples when NMT went wrong in the task of translating MWE related content and MWE terms, and we try to classify these errors into different types to facilitate further research.
In the end of the talk, you will realise that when MT has reached a new level of quality via NMT, MWEs become very apparent bottlenecks in front of MT researchers.
We also propose some possible solutions to address these issues.

Period11 Oct 2022
Held atUniversity of Stuttgart, Germany
Degree of RecognitionInternational


  • multiword expressions
  • machine translation
  • natural language processing
  • translation evaluation