Network Analysis of Non-treelike Patterns in Evolution

  • Yaqing Ou

Student thesis: Phd


Introgressive descents such as recombination, gene fusion and horizontal gene transfer (HGT) cause reticulate patterns in the evolutionary history of prokaryotes and eukaryotes, which are too complex to show in traditional tree-based models. In this thesis, we introduced network-based approaches such as the sequence similarity network (SSN) and explored its potential to investigating large datasets. Two different genetic features were investigated: (1) composite genes that are generated by the remodelling of two unrelated genetic segments; (2) CRISPR-Cas systems that are widely spread in prokaryotes as adaptive immune systems. First, we employed a network-based approach to explore gene remodelling. Non-homologous genes can form into a single open reading frame (ORF) through gene fusion. The new gene is called a composite gene while the parental genes are called component genes. To investigate the distribution of composite genes across all of life, we constructed SSNs of a large dataset containing more than 1 million genes from prokaryotes, eukaryotes, viruses and plasmids. In our dataset, 18.57% of genes were identified as composite genes, which were pervasively spread across three domains of life as well as all COG functional categories. We also found eukaryotic genes were more likely to be composites than prokaryotic genes. Second, we investigated the evolution history of the CRISPR-Cas locus. Prokaryotes are engaged in the constant arms race with foreign mobile genetic elements (MGEs). CRISPR-Cas, an important adaptive immune system in Archaea and Bacteria, is involved in diverse evolutionary processes. While under attack, it is thought that a spacer is directly acquired from the segment of the invader and integrated between the leading sequence and the first spacer, so spacers are ordered chronologically corresponding to the infection time. However, through comparative genome analysis, we found that old spacers were located upstream of new spacers, which indicated either the role of ectopic spacer integration or recombination. Further, we found the distribution of CRISPR-Cas is not uniform across prokaryotic phylogeny. To understand why this is the case, we used a co-occurrence approach to identify the association and disassociation between protein-coding genes and CRISPR-Cas systems. We found that genes that co-occurred with CRISPR-Cas are mainly in metabolic pathways and that the distribution of co-occurred genes in the phylogeny is compatible with the distribution of CRISPR-Cas subtypes, which suggested the influence of genetic background on the distribution of CRISPR-Cas systems. Collectively, network-based approaches have shown great potential in helping identify non-vertical evolutions.
Date of Award31 Aug 2021
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorJames Mcinerney (Supervisor) & Thomas House (Supervisor)


  • Introgressive descent
  • Horizontal gene transfer
  • Eukaryote
  • CRISPR-Cas
  • Composite genes
  • Prokaryote
  • Sequence similarity network

Cite this