An investigation into multi-word expressions in machine translation

Research output: ThesisDoctoral Thesis

Abstract

Multi-word Expressions (MWEs) present challenges in natural language processing and computational linguistics due to their popular usage, richness in variety, idiomaticity, and non-decompositionality, which are present in the text content in which they are used. This is a typical level of expectation in the machine translation (MT) field where we require algorithms to perform a translation from one human language to another automatically while requiring high-quality output including features such as adequacy, fluency, and keeping the same or making creative and correct style decisions in that output. In this thesis, we carry out an extensive investigation into MWEs in Neural MT. Firstly, we carry out a review of relevant literature which includes experimental work on re-examining state-of-the-art models that combine knowledge of MWEs into MT systems, but with new language pairs setting to see what gaps might exist in the published literature. Secondly, we propose our new models on how to address MWE translations. This includes a design where we treat MWEs as low-frequency words and phrases translation issues, by integrating language-specific features such as strokes and radicals representation of Chinese characters into the learning model, expecting that this will facilitate improved accuracy. Thirdly, to properly examine different MT models' performances in the context of MWEs, we need to carry out a new evaluation methodology, and in light of this, we create a multilingual parallel corpus with MWE annotations (AlphaMWE). During the creation of this corpus, we classify the MT issues on MWE-related content into several categories with the expectation that this will help future MT researchers to focus on one or some of these in order to achieve a new state of the art in MT performance, ultimately moving towards human parity. Finally, we propose a new methodology for human in the loop MT evaluation with MWE considerations (HiLMeMe).
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Dublin City University
Award date1 Jan 2022
Place of PublicationDublin, Ireland
Publication statusPublished - 4 Jan 2022

Fingerprint

Dive into the research topics of 'An investigation into multi-word expressions in machine translation'. Together they form a unique fingerprint.

Cite this