MMT’s Submission for the WMT 2023 Quality Estimation Shared Task

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

This paper presents our submission to the WMT 2023 Quality Estimation (QE) shared task 1 (sentence-level subtask). We propose a straightforward training data augmentation approach aimed at improving the correlation between QE model predictions and human quality assessments. Utilising eleven data augmentation approaches and six distinct language pairs, we systematically create augmented training sets by individually applying each method to the original training set of each respective language pair. By evaluating the performance gap between the model before and after training on the augmented dataset, as measured on the development set, we assess the effectiveness of each augmentation method. Experimental results reveal that synonym replacement via the Paraphrase Database (PPDB) yields the most substantial performance boost for language pairs English-German, English-Marathi and English-Gujarati, while for the remaining language pairs, methods such as contextual word embeddings-based words insertion, back translation, and direct paraphrasing prove to be more effective. Training the model on a more diverse and larger set of samples does confer further performance improvements for certain language pairs, albeit to a marginal extent, and this phenomenon is not universally applicable. At the time of submission, we select the model trained on the augmented dataset constructed using the respective most effective method to generate predictions for the test set in each language pair, except for the English-German. Despite not being highly competitive, our system consistently surpasses the baseline performance on most language pairs and secures a third-place ranking in the English-Marathi.
Original languageEnglish
Title of host publicationProceedings of the Eighth Conference on Machine Translation
Pages856-862
DOIs
Publication statusPublished - 1 Dec 2023
EventProceedings of the Eighth Conference on Machine Translation - Singapore
Duration: 1 Dec 20231 Dec 2023

Conference

ConferenceProceedings of the Eighth Conference on Machine Translation
Period1/12/231/12/23

Fingerprint

Dive into the research topics of 'MMT’s Submission for the WMT 2023 Quality Estimation Shared Task'. Together they form a unique fingerprint.

Cite this