Towards Faster Gene Expression Prediction via Dimensionality Reduction and Feature Selection

Jeremy Watts, Elexis Allen, Ahmad Mitoubsi, Anahita Khojandi, James Eales, Theodore Papamarkou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Downloads (Pure)


The majority of genes have a genetic component to their expression. Elastic nets have been shown effective at predicting tissue-specific, individual-level gene expression from genotype data. We apply principal component analysis (PCA), linkage disequilibrium pruning, or the combination of the two to reduce, or generate, a lower-dimensional representation of the genetic variants used as inputs to the elastic net models for the prediction of gene expression. Our results show that, in general, elastic nets attain their best performance when all genetic variants are included as inputs; however, a relatively low number of principal components can effectively summarize the majority of genetic variation while reducing the overall computation time. Specifically, 100 principal components reduce the computational time of the models by over 80% with only an 8% loss in R2. Finally, linkage disequilibrium pruning does not effectively reduce the genetic variants for predicting gene expression. As predictive models are commonly made for over 27; 000 genes for more than 50 tissues, PCA may provide an effective method for reducing the computational burden of gene expression analysis.
Original languageEnglish
Title of host publication45th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Publication statusPublished - 24 Jul 2023


Dive into the research topics of 'Towards Faster Gene Expression Prediction via Dimensionality Reduction and Feature Selection'. Together they form a unique fingerprint.

Cite this