Random forests (RFs) are effective at predicting gene expression from genotype data. However, a comparison of RF regressors and classifiers, including feature selection and encoding, has been under-explored in the context of gene expression prediction. Specifically, we examine the role of ordinal or one-hot encoding and of data balancing via oversam-pling in the prediction of obesity-associated gene expression. Our work shows that RFs compete with PrediXcan in the prediction of obesity-associated gene expression in subcutaneous adipose tissue, a highly relevant tissue to obesity. Additionally, RFs generate predictions for obesity-associated genes where PrediXcan fails to do so.
|Number of pages||4|
|Publication status||Published - 8 Sep 2022|
|Event||44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) - |
Duration: 11 Jul 2022 → 15 Jul 2022
|Conference||44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)|
|Period||11/07/22 → 15/07/22|
- Gene Expression