Coherence-based Document-level Text Simplification

Student thesis: Phd

Abstract

Text Simplification (TS) is an important task in Natural Language Processing, where a complex text is transformed into a simpler version, making it more understandable for wider audiences, such as people with disabilities, non-native speakers or people with limited expertise in specialised areas, such as healthcare, law, public administration and assistive technologies (i.e, devices to aid people with disabilities). TS plays a relevant role in downstream applications, such as summarisation, information extraction, question answering and machine translation. In the past decade, TS has focused mainly on sentence-level simplification. In this approach, individual sentences are simplified without considering the context they belong to, possibly leading to disruptions in discourse. Further, the main need of TS audiences is to understand complex documents rather than isolated sentences. In this thesis, we focus our research questions on the most important and challenging aspects of TS: evaluation and simplification at a document level. Firstly, we study the primary evaluation resources in TS to clearly understand the limitations and improvements needed to transition to a document-level scenario. We present a detailed analysis of TS corpora based on simplification operations and their statistical distribution, showing the existing limitations in the field through a simplification benchmark using better-distributed datasets. We also make recommendations for building and evaluating new TS datasets. Secondly, we consider the evaluation of discourse connections between simplifications and across sentences and paragraphs (i.e.,) as a quality measure. We enhance a state-of-the-art model for TS generation at a sentence level using paraphrasing data for document-level simplification. Then, we demonstrate the evaluation of coherence using neural methods, highlighting the challenges and limitations faced when performing TS at the document level. Furthermore, we propose the assessment of coherence including better data representations and novel methods. Thirdly, we explore novel TS methods at the document level supported by the evaluation methods studied in this dissertation. Finally, we introduce a simplification model considering simplicity, readability and discourse aspects such as coherence. We hope that our contributions to document-level evaluation and simplification, motivate the further development of this challenging research area.
Date of Award1 Aug 2024
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorSophia Ananiadou (Supervisor) & Nhung Nguyen (Supervisor)

Keywords

  • Document-level
  • Evaluation
  • Text Simplification
  • Natural Language Processing
  • Coherence

Cite this

'