Informatics strategies for the analysis and interpretation of high-throughput DNA sequencing datasets

Student thesis: Phd


Background: Massively parallel sequencing technologies (next-generation sequencing, NGS) have accelerated understanding of human genomics. The adoption of NGS in the study of human disease has transformed discovery and diagnostic genomics. These advantages have been especially evident for human disorders characterized by immense heterogeneity. These include Mendelian disorders, rare diseases that are often clinically and genetically heterogeneous and are underpinned for each individual by mutations in a single gene. However, a number of limitations exist in the application of NGS technologies for the study of Mendelian disorders, including the capability to detect and interpret the complete spectrum of variation within the human genome. Methods: Using inherited retinal diseases (IRD) as a model set of heterogeneous disorders, the investigations presented in this thesis have assessed trends in large cohorts of individuals, assessed the utility of different NGS approaches for the discovery of disease-causing genomic variation, and developed novel informatics strategies to detect and identify disease-causing variants from NGS datasets. The NGS approaches applied include whole exome sequencing, whole genome sequencing and large gene panels. Results: This research has advanced knowledge of the genetic basis of IRD, demonstrated the advantage of personalized genomic medicine for individuals with IRD, identified limitations in currently employed NGS diagnostic services, and developed informatics strategies to overcome identified limitations across a range of heterogeneous Mendelian disorders. Importantly, through the application of read-depth strategies for the identification of copy number variation (CNV), this research has increased the versatility of NGS datasets generated and analysed in clinically accredited laboratories. Conclusions: The analysis frameworks applied in this research have set a paradigm for the analysis and integration of the 100,000 genomes project datasets within national genome medicine centres. These investigations have also identified a number of outstanding challenges for the analysis of NGS datasets in the context of Mendelian disorders that represent important and interesting topics for future research, including the expansion of analysis strategies to detect pathogenic non-coding and regulatory variation.
Date of Award1 Aug 2017
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorGraeme Black (Supervisor) & Janine Lamb (Supervisor)


  • genetics
  • genomics
  • Mendelian disease
  • Next-generation sequencing
  • Whole genome sequencing
  • Copy number variation

Cite this