Ascertaining Patterns of Asthma Symptoms in Childhood using Machine Learning Methods

  • Matea Deliu

Student thesis: Phd


The prevalence and incidence of asthma in children is continually rising and creating a burden on health systems due to high health care costs. Asthma is a heterogeneous disease however current definitions do not capture the heterogeneity of this complex condition as it is becoming increasingly clear that it is not a single disease but rather a collection of syndromes which consist of a number of disease subtypes (‘endotypes’) with similar observable and measurable clinical characteristics (‘phenotypes’). Identifying true endotypes of asthma and disaggregating the heterogeneity of the disease is required for achieving better pathophysiological mechanism-based treatment targeting, and thus delivering genuinely personalised pharmacological treatment in asthma. Methods of ascertaining these endotypes have ranged from investigator-led pattern identification in the clinical setting, to supervised and unsupervised statistical modelling techniques that utilize large scale data and computer algorithms to find the latent (hidden, unknown a-priori) patterns of observable features (such as symptoms, medication use, allergic sensitization, lung function). Data-driven approaches allow the data to essentially speak for itself without any a-priori hypotheses imposition guiding the analysis. This ultimately eliminates investigator bias and enables novel hypotheses to be generated. Using two different rich data sources (Turkish population cohort and Manchester Asthma and Allergy Study) both cross-sectionally and longitudinally, this thesis aimed to answer the following research questions: 1) Can we use data-driven methods to uncover patterns among asthma datasets and how can this help guide our further understanding of the disease? 2) What main features of the asthma syndrome can be used to ascertain the heterogeneity of the disease? 3) How can we exploit the wealth of data provided by longitudinal birth cohorts in order to understand the severity of asthma? Chapter 2 of the thesis introduced and explained in detail the use of machine learning methods such as cluster analysis and latent class analysis that have been increasingly frequently used in ascertaining patterns of asthma phenotypes. Chapter 3 then puts this data-driven methodology in context by discussing the advancements in knowledge acquired from the use of these algorithms. Using cross-sectional data from Turkey, Chapter 4 creates a framework for the discovery of stable and clinically meaningful asthma subtypes by blending data with clinical expert domain knowledge to identify four main informative features (age of onset, atopy, exacerbations, severity). To that end, Chapters 5 and 6 used longitudinal data in order to explore exacerbations and asthma severity in more detail. Two independent exacerbation subtypes were identified (frequent and infrequent exacerbations) along with three wheeze severity states (mild/moderate wheeze, severe wheeze, and transitioning wheeze). This thesis represents an advancement on our current knowledge of the heterogeneity of asthma by identifying novel results through the use of machine learning methodologies.
Date of Award1 Aug 2020
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorAdnan Custovic (Supervisor), Matthew Sperrin (Supervisor) & Nophar Geifman (Supervisor)


  • machine learning
  • endotypes
  • childhood asthma
  • bioinformatics

Cite this