Applying Software Engineering Principles to Electronic Health Records in order to Improve Research and Generate Patient-Specific Actionable Information

Student thesis: Phd


Healthcare services increasingly collect data about their patients in electronic health records (EHRs). These data are also increasingly available for secondary uses, which include, but are not limited to: researchers using the data to perform retrospective observational studies or to recruit to randomised clinical trials; clinicians improving their performance with actionable information from clinical decision support or audit and feedback systems; and national bodies assessing and comparing the quality of care provided across a range of healthcare providers. The potential benefits of these secondary uses are great, but there is a problem. The data are collected with the primary purpose of direct patient care and therefore caution must be exercised in interpreting, extracting and transforming the data into the form required for the secondary uses. The methods that are used to extract and transform EHR data are of the utmost importance to the subsequent uses; however, they receive much less attention than other aspects of study methodology (such as the statistical analysis) and are chronically underreported in the literature. Without proper reporting and scrutiny of these methods, it is impossible to determine if mistakes or incorrect assumptions have affected the validity of the results. Therefore, confidence in the results is reduced and the impact of the research is diminished. The viewpoint presented in this thesis is that preparing EHR data for secondary uses is a form of software engineering and therefore should comply with software engineering principles. The objective of the thesis is to present a comprehensive collection of methods and tools, developed in accordance with the best software engineering principles, that bridge the gap between EHR data and secondary uses. These methods include the construction of clinical code sets, the extraction of clinical events from an EHR, and the analysis of sequences of clinical events. In particular, we focus on the use of these methods for observational studies using primary care EHR data, and actionable information as delivered via decision support systems. However many of the methods apply to the full range of secondary uses. We need robust, open and transparent methodologies, in order to increase the confidence in results from research using EHR data. Only then, can we achieve a much-needed acceleration in healthcare research, and realise the benefits of reproducible, patient-specific, actionable information.
Date of Award1 Aug 2021
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorNiels Peek (Supervisor)


  • Electronic Health Records
  • Software Engineering
  • Electronic Audit and Feedback
  • Clinical Coding
  • Observational Research

Cite this