Maximising the utility of electronic health records: answering clinically important questions whilst expanding methods for data quality, transparency, and reproducibility

  • Rebecca Joseph

Student thesis: Phd


Electronic health records (EHR) have become a widely used data source for epidemiological research. Such datasets often contain rich, prospectively-collected health data for a large number of patients. The data are less susceptible to recall bias than retrospectively-collected data, and using existing data can be cheaper and more efficient than setting up a new prospective cohort study. However, there are recognised challenges to reusing data originally collected for another purpose which could limit the value of research set within EHR. For example, the raw datasets are complex and thus the process of preparing EHR for analysis can be time-consuming and difficult to report transparently. In addition, routinely-collected data, such as EHR, are typically of lower quality than data collected directly for research. This can increase the risk of measurement errors and misclassification bias. This thesis explores the benefits and challenges of using EHR for health research, and presents methodologies developed to address some of the challenges. Eight publications are presented, arranged into three themes. First, examples of health research set within UK primary care EHR and the contribution of these publications to the literature are presented. Second, the challenge of preparing EHR for analysis is explored. An example of developing and sharing a reusable and flexible data preparation algorithm as a means to improve the transparency and efficiency of the process is presented. Third, the challenge of measurement error is explored. A novel study design combining EHR with data collected directly from patients is presented, alongside a validation study using patient-reported drug use information to quantify measurement error in prescription-derived estimates of drug exposure. The work presented in this thesis demonstrates that while studies set within EHR represent a valuable contribution to the literature, there remain a number of challenges to resolve. However, all data sources have limitations and, if appropriately accounted for, these challenges should not preclude the use of EHR for health research. A focus on transparency, validation, and replication of findings could help increase confidence in utilising EHR for research.
Date of Award31 Dec 2018
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorWilliam Dixon (Supervisor)


  • Electronic Health Records
  • Epidemiology
  • Pharmacoepidemiology
  • Measurement Error
  • Transparency

Cite this