Bayesian Methods for Gene Expression Analysis from High-Throughput Sequencing data

  • Peter Glaus

Student thesis: Phd


We study the tasks of transcript expression quantification and differentialexpression analysis based on data from high-throughput sequencing of thetranscriptome (RNA-seq).In an RNA-seq experiment subsequences of nucleotides are sampled from atranscriptome specimen, producing millions of short reads. The reads can bemapped to a reference to determine the set of transcripts from which they weresequenced. We can measure the expression of transcripts in the specimen bydetermining the amount of reads that were sequenced from individualtranscripts.In this thesis we propose a new probabilistic method for inferring theexpression of transcripts from RNA-seq data. We use a generative model of thedata that can account for read errors, fragment length distribution andnon-uniform distribution of reads along transcripts. We apply the Bayesianinference approach, using the Gibbs sampling algorithm to sample from theposterior distribution of transcript expression. Producing the fulldistribution enables assessment of the uncertainty of the estimated expressionlevels.We also investigate the use of alternative inference techniques for thetranscript expression quantification. We apply a collapsed Variational Bayesalgorithm which can provide accurate estimates of mean expression faster thanthe Gibbs sampling algorithm.Building on the results from transcript expression quantification, we present anew method for the differential expression analysis. Our approach utilizes thefull posterior distribution of expression from multiple replicates in order todetect significant changes in abundance between different conditions. Themethod can be applied to differential expression analysis of both genes andtranscripts.We use the newly proposed methods to analyse real RNA-seq data and provideevaluation of their accuracy using synthetic datasets. We demonstrate theadvantages of our approach in comparisons with existing alternative approachesfor expression quantification and differential expression analysis.The methods are implemented in the BitSeq package, which is freely distributedunder an open-source license. Our methods can be accessed and used by otherresearchers for RNA-seq data analysis.
Date of Award1 Aug 2014
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorJonathan Shapiro (Supervisor) & Magnus Rattray (Supervisor)


  • bayesian inference
  • gene expression
  • transcript expression
  • RNA-seq
  • differential expression

Cite this