Using Semantic Frames for Measuring and Identifying Semantic Relationships in Software Descriptions

  • Waad Alhoshan

Student thesis: Phd


As most software requirements are written in natural language, they are unstructured and do not adhere to any formalism. Therefore, automatically processing these requirements in the context of Requirement Engineering (RE) is often difficult, complex and opaque. The problems fall under the remit of linguistic issues, such as ambiguity and incompleteness. Techniques and resources from Natural Language Processing (NLP) have been used for exploring natural language issues in unstructured requirement documents. The research in this hybrid area of RE studies has covered various tasks, including analysing, modelling and organising requirements, which generally referred as NLP for RE (or simply NLP4RE) research tasks. An essential linguistic process that is common in most of NLP4RE tasks is the process of identifying relationships between requirement statements, i.e., detecting semantic relatedness and similarity within a requirement document as a collection of software descriptions. By detecting such a complex and (mostly) hidden relationship in the natural description of requirements, we will end with more accurate and robust NLP4RE tools that could handle the lack of formalism in unstructured requirement documents. For example, to enable traceability between an arbitrary set of natural documents by linking their shared or common semantic relationships i.e. to trace requirements with specific concepts such as requirements that explain sending/receiving operations, verifying user credential for security purposes and more. This PhD thesis explores the potential of and adopts the semantic frames, embodied in the FrameNet lexicon, to provide unique insights and novel approaches (accompanied with several methods implemented into systems) for measuring and identifying semantic relationships in software descriptions expressed through unstructured, natural language. We follow a research methodology consists of collecting evidence of FrameNet's feasibility in RE, experimenting with various FrameNet-based solutions and critically appraising these solutions using real-world requirement documents. The first approach -- the knowledge-based approach -- is implemented based on the knowledge available in the FrameNet lexicon, through which we experiment with the various semantic similarity metrics used with different ontologies and lexica in FrameNet. The second approach -- the corpus-supported approach -- adapts FrameNet tagged corpora, one of which is the result of the earlier research method studying FrameNet's coverage of requirement documents. The corpus-supported approach utilises corpora features, such as frame frequencies and co-occurrences, to measure the relatedness between frames from the RE use context. The third and final approach -- the embedding-based approach -- is based on trained word embeddings for the RE domain. Thus, we propose new resources, i.e., embedding-based representations of semantic frames in FrameNet. We obtain motivational results from the corpus-based analysis, which has been conducted to study FrameNet's appropriateness for labelling software descriptions. Thus, this research creates the first RE corpus, consisting of 5,348 requirement statements, that is fully annotated with FrameNet frames. Afterwards, the proposed approaches to measure semantic frames' relatedness are evaluated based on their designated task -- identifying related semantic frames from the FrameNet while considering the RE context. The intrinsic evaluation is compared with a human-judgment dataset of frame-to-frame relationships. As a result, the embedding-based approach achieves more than a satisfactory overall performance rate in measuring and identifying semantic relationships between FrameNet frames from an RE perspective. For the extrinsic evaluation, we use the embedding-based approach in a requirement measurement technique to identify semantic relationships between natural language requirement statements. A satisfactory performance rate i
Date of Award1 Aug 2020
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorLiping Zhao (Supervisor) & Riza Theresa Batista-Navarro (Supervisor)


  • Requirements Engineering
  • Software Requirements
  • Natural Requirements
  • Requirement Document
  • Software Description
  • RE
  • Semantic Relatedness
  • NLP
  • Natural Language Processing
  • FrameNet
  • Semantic Relationships

Cite this