AbstractBackground: Adenoviruses are an endemic cause of human infection with 104 types across 7 species currently identified. Clinical syndromes vary but infection in transplantation can be severe. Deficiencies in current classification methods are apparent as the number of novel types continue to grow. Whole genome sequencing is employed in investigation of pathogenicity, drug resistance, transmission dynamics and nosocomial outbreak investigation but is rarely used in human Adenoviruses. Bioinformatic processing of sequencing data is particularly challenging as no commercial solutions or best practice guidelines currently exist. This project aimed to develop a method for direct enrichment of Adenovirus DNA from clinical samples for whole genome sequencing alongside generation of a validated open access pipeline for data analysis to update current knowledge of Adenovirus epidemiology. Method: A custom Agilent SureSelectXT RNA oligonucleotide Target Enrichment system was designed for adenovirus enrichment direct from faecal, blood and respiratory tract secretions for subsequent illumina sequencing. Minor modifications were made to the SureSelectXT low input protocol to improve suitability for non-human applications. A bespoke three stage Unix coded bioinformatics pipeline was constructed and validated through direct comparison of open-source tool performance. Illumina sequencing data from multiple Adenovirus types was used to determine suitability of quality trimming parameters, genome assembly and variant calling tool for inclusion. Genomic similarity of isolates was assessed at a whole genome and classical typing gene level phylogenetically and through pairwise distance models to assess current adenovirus epidemiology. Results: Adenovirus whole genomes could be enriched directly from various clinical sample matrices with a success rate of 94%. Sequence data quality trimming using Trimmomatic did not impact whole genome sequence generation regardless of parameters used. No significant difference was observed in genome assembly using BWA-MEM and Bowtie2 however variant calling tool performance was more inconsistent with LoFreq reporting the most variants. Starting concentration of Adenovirus DNA heavily influenced enrichment success where Adenovirus species or type did not. Species C Adenovirus most commonly found at 49% closely followed by B and A. Genetic differences between Adenoviruses of the same type were not always apparent from classical typing gene assessment but required whole genome analysis. Higher sequence divergence between Adenovirus types than previously observed was found during assessment of the hexon, penton and fiber genes Conclusions: This modified custom SureSelectXT protocol is a robust method for generating Adenovirus whole genome sequences from clinical samples and suitable bioinformatic pipelines can be generated using only open-source tools. Adenovirus epidemiology in the North West has remained largely unchanged however phylogenetic analysis has highlighted deficiencies in current typing methods and confirms the need for whole genome analysis in outbreak investigation.
|Date of Award||1 Aug 2022|
|Supervisor||Pamela Vallely (Supervisor)|
- Whole Genome Sequencing
- Target Enrichment