Innovative Use of Natural Language Processing (NLP) to Extract Process Safety Insights from Challenging Freetext Sources

  • John Clay

Student thesis: Phd


Process Safety is concerned with the prevention and mitigation of major accidents in the process industries, such as fires, explosions and toxic releases. These events can harm people and the environment, both on and offsite. It is recognised that preventing such events requires the evaluation of data which is available within the operating company, within the industry sector and more widely across other sectors worldwide. Many organisations create vast quantities of such data. Some of this is contained within coded databases - whereby individual fields such as 'equipment type' may be populated with a limited range of responses. However, it is also the case that many organisations also record information in unstructured text - 'freetext'. Often this text is provided for the benefit of other individual humans to understand the specific application which led to the creation of the text, for example a condition monitoring report which describes the health of one physical asset. There is great value in being able to aggregate learning from the insights contained within multiple documents. Unfortunately, the effort involved in this being undertaken manually will often mean that this cannot be achieved or limits the depth to which intelligence can be extracted. Text Mining and Natural Language Processing (NLP) comprises of a number of techniques which aim to extract useful information from collections of freetext data. Such techniques are widely applied across social media, search engines and other everyday applications. However, their deployment within engineering and Process Safety applications has been somewhat limited. Many of the techniques used to extract most value involve considerable overhead effort to deploy, which can be justified in mass-market applications such as social media. However, such effort may not be justifiable in niche applications such as Process Safety. The ultimate aim in extracting insight from freetext must be to share the resulting learning as widely as possible. Unfortunately, this too is fraught with difficulty. Concerns such as the accidental release of personal or commercial information, or the reputational damage to organisations revealed not to be managing critical assets means that there are many barriers in the way of transparency and sharing learning. This body of work has explored the use of Text Mining and NLP techniques as applied to Process Safety with the aim of balancing effort and reward. At the same time, the appetite and conditions under which industry would be willing to share data originating from freetext has been evaluated through a survey. The 'state of the art' in relation to NLP techniques has been compared to the practical application of such techniques in engineering and Process Safety. It is clear that there are many practical barriers which prevent the most advanced techniques being applied in practice. The more fundamental NLP techniques such as parts-of-speech tagging have been trialled on real-world freetext data from a large dataset existing within a key UK regulator for Process Safety issues in the onshore process industries. The research undertaken has shown that industry are significantly risk-averse when it comes to sharing data. The redaction of sensitive data may be required, but is not sufficient. NLP techniques based on 'off the shelf' but customised methods have proven efficient and effective at extracting insight, which can then be aggregated into coded data which provides great assurance around privacy concerns. The work concludes with the development of a Process Safety Data Sharing Framework to assist the selection and development of schemes to enable the extraction and sharing of process safety insights across industry.
Date of Award31 Dec 2023
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorMoray Kidd (Supervisor)


  • Loss of Containment
  • Incident Investigation
  • Toxic Release
  • Explosion
  • Process Safety
  • Text Mining
  • Natural Language Processing
  • Fire

Cite this