Using phylogenetics and model selection to investigate the evolution of RNA genes in genomic alignments.

  • James Allen

    Student thesis: Phd


    The diversity and range of the biological functions of non-coding RNA molecules (ncRNA) have only recently been realised, and phylogenetic analysis of the RNA genes that define these molecules can provide important insights into the evolutionary pressures acting on RNA genes, and can lead to a better understanding of the structure and function of ncRNA. An appropriate dataset is fundamental to any evolutionary analysis, and because existing RNA alignments are unsuitable, I describe a software pipeline to derive RNA gene datasets from genomic alignments. RNA gene prediction software has not previously been evaluated on such sets of known RNA genes, and I find that two popular methods fail to predict the genes in approximately half of the alignments. In addition, high numbers of predictions are made in flanking regions that lack RNA genes, and these results provide motivation for subsequent phylogenetic analyses, because a better understanding of RNA gene evolution should lead to improved methods of prediction.I analyse the RNA gene alignments with a range of evolutionary models of substitution and examine which models best describe the changes evident in the alignment. The best models are expected to provide more accurate trees, and their properties can also shed light on the evolutionary processes that occur in RNA genes. Comparing DNA and RNA substitution models is non-trivial however, because they describe changes between two different types of state, so I present a proof that allows models with different state spaces to be compared in a statistically valid manner. I find that a large proportion of RNA genes are well described by a single RNA model that includes parameters describing both nucleotides and RNA structure, highlighting the multiple levels of constraint that act on the genes. The choice of model affects the inference of a phylogenetic tree, suggesting that model selection, with RNA models, should be standard practice for analysis of RNA genes.
    Date of Award1 Aug 2014
    Original languageEnglish
    Awarding Institution
    • The University of Manchester
    SupervisorSimon Whelan (Supervisor) & Simon Lovell (Supervisor)


    • PHASE software
    • ncRNA
    • model selection

    Cite this