Abstract
Investigating the behaviour of Machine Reading Comprehension (MRC) models under various types of test-time perturbations can shed light on the enhancement of their robustness and generalisation capability, despite the superhuman performance they have achieved on existing benchmark datasets. In this paper, we study the robustness of contemporary MRC systems to context paraphrasing, i.e., whether these models are still able to correctly answer the questions once the reading passages have been paraphrased. To this end, we systematically design a pipeline to semi-automatically generate perturbed MRC instances which ultimately lead to the creation of a paraphrased test set. We conduct experiments on this data set with six state-of-the-art neural MRC models and we find that even the minimum performance drop of all these models exceeds 41%,whereas human performance remains high. Retraining models with augmented perturbed examples results in improved robustness, though the performance remains lower than on the original dataset. These results demonstrate that the existing high-performing MRC systems are still far away from real language understanding1.
Original language | English |
---|---|
Title of host publication | Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics |
Subtitle of host publication | Volume 2: Short Papers |
Publisher | Association for Computational Linguistics |
Pages | 184-196 |
DOIs | |
Publication status | Published - 4 Nov 2023 |
Event | Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers) - Nusa Dua, Bali Duration: 1 Nov 2023 → 1 Nov 2023 |
Conference
Conference | Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers) |
---|---|
Period | 1/11/23 → 1/11/23 |