Lexical Simplification: Optimising the Pipeline

  • Matthew Shardlow

Student thesis: Phd


Introduction: This thesis was submitted by Matthew Shardlow to the University of Manchester for the degree of Doctor of Philosophy (PhD) in the year 2015. Lexical simplification is the practice of automatically increasing the readability and understandability of a text by identifying problematic vocabulary and substituting easy to understand synonyms. This work describes the research undertaken during the course of a 4-year PhD. We have focused on the pipeline of operations which string together to produce lexical simplifications. We have identified key areas for research and allowed our results to influence the direction of our research. We have suggested new methods and ideas where appropriate.Objectives: We seek to further the field of lexical simplification as an assistive technology. Although the concept of fully-automated error-free lexical simplification is some way off, we seek to bring this dream closer to reality. Technology is ubiquitous in our information-based society. Ever-increasingly we consume news, correspondence and literature through an electronic device. E-reading gives us the opportunity to intervene when a text is too difficult. Simplification can act as an augmentative communication tool for those who find a text is above their reading level. Texts which would otherwise go unread would become accessible via simplification.Contributions: This PhD has focused on the lexical simplification pipeline. We have identified common sources of errors as well as the detrimental effects of these errors. We have looked at techniques to mitigate the errors at each stage of the pipeline. We have created the CW Corpus, a resource for evaluating the task of identifying complex words. We have also compared machine learning strategies for identifying complex words. We propose a new preprocessing step which yields a significant increase in identification performance. We have also tackled the related fields of word sense disambiguation and substitution generation. We evaluate the current state of the field and make recommendations for best practice in lexical simplification. Finally, we focus our attention on evaluating the effect of lexical simplification on the reading ability of people with aphasia. We find that in our small-scale preliminary study, lexical simplification has a nega- tive effect, causing reading time to increase. We evaluate this result and use it to motivate further work into lexical simplification for people with aphasia.
Date of Award31 Dec 2015
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorJohn Mcnaught (Supervisor) & Simon Harper (Supervisor)


  • Langauge Resource Evaluation
  • Complex Word Identification
  • Ttext Simplification
  • Lexical Simplification
  • Natural Language Processing

Cite this