From genomes to metabolomes: computational methods for systems and synthetic biology

Student thesis: Phd


Systems biology is a growing research area that aims at the understanding of molecular, cellular and organismal biology at the systems level, i.e. at the level of the emergent properties arising from the tight interactions of biology’s functional components. At its best, it is able to mathematically model a complex biological system under study and to predict its often non-intuitive behaviour in response to experimental perturbations. The resulting predictive power constitutes the foundation of synthetic biology, the advanced engineering of biological systems with useful new functions. Both these closely related disciplines are experimental sciences and strongly rely on the data generated by postgenomic molecular profiling experiments. As more and larger datasets are becoming available, the bottleneck of both systems and synthetic biology has shifted from data production to data analysis: analogous to other engineering disciplines, the development and application of computational tools for the analysis and interpretation of (biological) data, i.e. bioinformatics, has gained a central role in both systems and synthetic biology. This thesis focuses on the development and application of new (or newly implemented) computational tools for systems and synthetic biology. Chapter 2 describes the implementation of the rank product and rank sum statistics in a substantially improved R package for the analysis of post-genomic datasets. This package has been successfully applied in Chapter 3, together with a new and improved version of the iterative Group Analysis, to analyse the changes in the proteome observed during non-classical secretion of proteins in engineered strains of Escherichia coli BL21(DE3). Chapter 4 expands on the theme of post-genomic molecular profiling analysis and describes the Integrated Probabilistic Annotation (IPA), a novel Bayesian-based method providing a rigorous and reproducible metabolite annotation for LC/MS data obtained from untargeted metabolomics experiments. The final two chapters change the focus to the earlier steps of the systems and synthetic biology pipeline, the characterization of the functional components of a complex biosynthetic system: Chapter 5 introduces the output ordering and prioritisation system OOPS, a publicly available software able to prioritize biosynthetic gene clusters detected in microbial genome sequences according to a wide variety of custom-weighted biological and biochemical criteria. Finally, Chapter 6 describes a newly implemented unsupervised statistical method able to successfully detect a large number of modules (putative functional sub-clusters) within an extensive set of predicted biosynthetic gene clusters in a systematic and automated manner.
Date of Award1 Aug 2018
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorRainer Breitling (Supervisor) & Eriko Takano (Supervisor)


  • biosynthetic gene clusters
  • proteomics
  • genomics
  • natural products
  • Systems biology
  • metabolomics
  • Synthetic biology
  • metabolite identification
  • Biostatistics

Cite this