Mining Patient Journeys From Healthcare Narratives

  • Azad Dehghan

Student thesis: Phd


The aim of the thesis is to investigate the feasibility of using text mining methods to reconstruct patient journeys from unstructured clinical narratives.A novel method to extract and represent patient journeys is proposed and evaluated in this thesis. A composition of methods were designed, developed and evaluated to this end; which included health-related concept extraction, temporal information extraction, and concept clustering and automated work-flow generation.A suite of methods to extract clinical information from healthcare narratives were proposed and evaluated in order to enable chronological ordering of clinical concepts. Specifically, we proposed and evaluated a data-driven method to identify key clinical events (i.e., medical problems, treatments, and tests) using a sequence labelling algorithm, CRF, with a combination of lexical and syntactic features, and a rule-based post-processing method including label correction, boundary adjustment and false positive filter. The method was evaluated as part of the 2012 i2b2 challengeand achieved a state-of-the-art performance with a strict and lenient micro F1-measure of 83.45% and 91.13% respectively. A method to extract temporal expressions using a hybrid knowledge- (dictionary and rules) and data-driven (CRF) has been proposed and evaluated. The method demonstrated the state-of-the-art performance at the 2012 i2b2 challenge: F1-measure of 90.48% and accuracy of 70.44% for identification and normalisation respectively. For temporal ordering of events we proposed and evaluated a knowledge-driven method, with a F1-measure of 62.96% (considering the reduced temporal graph) or 70.22% for extraction of temporal links. The method developed consisted of initial rule-based identification and classification components which utilised contextual lexico-syntactic cues for inter-sentence links, string similarity for co-reference links, and subsequently a temporal closure component to calculate transitive relations of the extracted links. In a case study of survivors of childhood central nervous system tumours (medulloblastoma), qualitative evaluation showed that we were able to capture specific trends part of patient journeys. An overall quantitative evaluation score (average precision and recall) of 94-100% for individual and 97% for aggregated patient journeys were also achieved. Hence, indicating that text mining methods can be used to identify, extract and temporally organise key clinical concepts that make up a patient's journey. We also presented an analyses of healthcare narratives, specifically exploring the content of clinical and patient narratives by using methods developed to extract patient journeys. We found that health-related quality of life concepts are more common in patient narrative, while clinical concepts (e.g., medical problems, treatments, tests) are more prevalent in clinical narratives. In addition, while both aggregated sets of narratives contain all investigated concepts; clinical narratives contain, proportionally, more health-related quality of life concepts than clinical concepts found in patient narratives. These results demonstrate that automated concept extraction, in particular health-related quality of life, as part of standard clinical practice is feasible.The proposed method presented herein demonstrated that text mining methods can be efficiently used to identify, extract and temporally organise key clinical concepts that make up a patient's journey in a healthcare system. Automated reconstruction of patient journeys can potentially be of value for clinical practitioners and researchers, to aid large scale analyses of implemented care pathways, and subsequently help monitor, compare, develop and adjust clinical guidelines both in the areas of chronic diseases where there is plenty of data and rare conditions where potentially there are no established guidelines.
Date of Award1 Aug 2015
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorGoran Nenadic (Supervisor) & John Keane (Supervisor)


  • Patient Pathway
  • Clinical Pathway
  • Patient Journey
  • Temporal Text Mining
  • Natural Language Processing
  • Clinical Text Mining

Cite this