TY - JOUR
T1 - An Evidential Reasoning Rule Based Feature Selection for Improving Trauma Outcome Prediction
AU - Almaghrabi, Fatima
AU - Xu, Dong-Ling
AU - Yang, Jian-Bo
PY - 2021/1/14
Y1 - 2021/1/14
N2 - Various demographic and medical factors can be linked to severe deterioration of patients suffering from traumatic injuries. Accurate identification of the most relevant variables is essential for building more accurate prediction models and making more rapid life-saving medical decision. The intention of this paper is to select a number of features that can be used to accurately predict patients’ outcomes through three feature selection methods: random forest, ReliefF and the evidential reasoning (ER) rule. The impact of an outcome’s class imbalance on feature selection is discussed, and synthetic minority over-sampling technique (SMOTE) is performed to show the differences in the selected features. The results show that length of stay in hospital, length of stay in intensive care unit, age and Glasgow Coma Scale (GCS) are the most selected features across different techniques. The prediction models based on the features selected by the ER rule show the highest prediction performance represented by the area under the receiver operating characteristic curve (AUC) values, which has a median of 0.895 for the model employed by the ten highest-weighted variables, while the median AUC values are 0.827 and 0.885 if the ten highest-weighted variables are selected by ReliefF and random forest respectively. The results also show that after the ten most important features, increasing the number of the less important features has only a slight increase in prediction accuracy.
AB - Various demographic and medical factors can be linked to severe deterioration of patients suffering from traumatic injuries. Accurate identification of the most relevant variables is essential for building more accurate prediction models and making more rapid life-saving medical decision. The intention of this paper is to select a number of features that can be used to accurately predict patients’ outcomes through three feature selection methods: random forest, ReliefF and the evidential reasoning (ER) rule. The impact of an outcome’s class imbalance on feature selection is discussed, and synthetic minority over-sampling technique (SMOTE) is performed to show the differences in the selected features. The results show that length of stay in hospital, length of stay in intensive care unit, age and Glasgow Coma Scale (GCS) are the most selected features across different techniques. The prediction models based on the features selected by the ER rule show the highest prediction performance represented by the area under the receiver operating characteristic curve (AUC) values, which has a median of 0.895 for the model employed by the ten highest-weighted variables, while the median AUC values are 0.827 and 0.885 if the ten highest-weighted variables are selected by ReliefF and random forest respectively. The results also show that after the ten most important features, increasing the number of the less important features has only a slight increase in prediction accuracy.
M3 - Article
SN - 1568-4946
JO - Applied Soft Computing
JF - Applied Soft Computing
ER -