Interpretable machine learning methods for integrative analysis of single cell multiomics

  • Xin Ma

Student thesis: Phd

Abstract

In higher organisms, cells within different tissues undergo complex processes like proliferation, differentiation and death, influenced by their surrounding environments. Underlying these processes is a complex hierarchical gene regulatory control which define functional role of cells. Recent methodological and technological advancements enable the simultaneous profiling of different layers of cellular machinery, such as the genome, epigenome, transcriptome, proteome, and other emerging omic fields. These developments are transforming our understanding of biological mechanisms and the connection between genotype and phenotype, thereby revolutionizing molecular cell biology. In my thesis, I present both established and cutting-edge multiomics technologies. I show how these technologies have evolved and improved over the past decade, addressing the challenges and limitations as well. I emphasize the significant impact of single-cell multiomics in areas like cell lineage tracing, creating tissue- and cell-specific atlases, tumor immunology, cancer genetics, and mapping cellular spatial information in both fundamental and applied research. Finally, I discuss bioinformatics tools that have been developed to link different omics modalities and elucidate functionality through the use of better mathematical modelling and computational methods In order to aid biological discovery, my aim is to develop interpretable computational methods for multiomics integrative analysis. Interpretation is crucial for machine learning methods when applied to biological systems. It is necessary for identifying cell type-specific signatures in the observed feature space, unraveling complex feature interactions across various omics layers, and inferring gene regulatory networks involving genes and proteins as well as regulatory genomic regions. With the overall goal of developing interpretable methodologies to integrate multiomics data and understanding gene regulatory mechanisms, I present three chapters, each proposing a specific self-contained project. These include a two-step regulon-based gene regulation network (GRN) inference method which combines single-cell RNA sequencing data with context-specific chromatin accessibility information in chapter 3 and an efficient triple non-negative matrix factorisation method for simultaneously identifying cell types and feature linkages across omics in chapter 4. In chapter 5, I introduce a novel deep generative model specifically designed to bring more interpretation for biological downstream analysis. The work presented in this thesis consistently focuses on computational methods for integrating multiomics data with interpretable outputs, elucidating biological mechanisms via clear mathematical models and efficient computational implementations.
Date of Award1 Aug 2024
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorMagnus Rattray (Supervisor) & Mudassar Iqbal (Supervisor)

Cite this

'