The identification of pathogenic variants in Mendelian disease patients underpins disease management, genetic counselling and potentially treatment. Despite recent advances in next-generation sequencing (NGS), over half of Mendelian disease patients are unable to receive a molecular diagnosis for their disorders. A growing body of evidence suggests that disruption of pre-mRNA splicing is an under-analysed cause of pathogenesis in Mendelian disease. Variants affecting conserved splicing motifs in mRNA transcripts can lead to pathogenic mis-splicing, whereby stretches of sequence are erroneously inserted or omitted from the canonical mRNA transcript. Recent years have seen a surge in the number of bioinformatics tools available to begin to predict the effect of these variants, and to analyse their effect in empirical functional assays. However, much remains to be learned about the efficacy of these predictive tools, and the identification of mis-splicing events in empirical datasets, such as those derived from RNA sequencing (RNA-seq), remains in its infancy. Through deepening our understanding of these areas, there is promise to improve diagnostic yield for numerous cohorts of patients. Here, I apply novel bioinformatics analyses at multiple stages along the process from variant identification to functional corroboration, with the aim of improving diagnostic yield and the quality of variant reporting. I identify an optimal strategy for predictive analysis of splicing impact in variants identified through upstream diagnostic testing, which reveals that the predictive tool SpliceAI provides the best accuracy in analysis of clinical variants impacting splicing. I further develop a bespoke approach for the investigation of a subset of splice-impacting variants impacting the intronic branchpoint sequence, resulting in the identification of a causative pathogenic variant in the BBS1 gene. Finally, I develop a novel metric to guide the clinical integration of RNA-seq as a tool for investigating splice impact, which reveals disease- and tissue-specific use cases for RNA-seq in the investigation of mis-splicing.
- clinical bioinformatics