Abstract
Random forests (RFs) are effective at predicting gene expression from genotype data. However, a comparison of RF regressors and classifiers, including feature selection and encoding, has been under-explored in the context of gene expression prediction. Specifically, we examine the role of ordinal or one-hot encoding and of data balancing via oversam-pling in the prediction of obesity-associated gene expression. Our work shows that RFs compete with PrediXcan in the prediction of obesity-associated gene expression in subcutaneous adipose tissue, a highly relevant tissue to obesity. Additionally, RFs generate predictions for obesity-associated genes where PrediXcan fails to do so.
Original language | English |
---|---|
Pages | 4407-4410 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 8 Sept 2022 |
Event | 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) - Duration: 11 Jul 2022 → 15 Jul 2022 |
Conference
Conference | 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) |
---|---|
Period | 11/07/22 → 15/07/22 |
Keywords
- Algorithms
- Gene Expression
- Humans
- Obesity/genetics