Abstract
Background: Heart failure with preserved or mildly reduced ejection fraction includes a heterogenous group of patients. Reclassification into distinct phenogroups to enable targeted interventions is a priority. This study aimed to identify distinct phenogroups, compare phenogroup characteristics and outcomes, and identify factors to straightforwardly predict phenogroup membership, from electronic health record data.
Methods: 2,187 patients admitted to five UK hospitals with a diagnosis of HF and a left ventricular ejection fraction > 40% were identified from the NIHR Health Informatics Collaborative database. Partition-based, model-based, and density-based machine learning clustering techniques were applied. Cox Proportional Hazards and Fine-Gray competing risks models were used to compare outcomes (all-cause mortality and hospitalisation for HF) across phenogroups. Penalised multinomial logistic regression was applied to predict phenogroup membership.
Results: Three phenogroups were identified: 1. Younger, predominantly female patients with high prevalence of cardiometabolic and coronary disease; 2. More frail patients, with higher rates of lung disease and atrial fibrillation; 3. Patients characterised by systemic inflammation and high rates of diabetes and renal dysfunction. Survival profiles were distinct, with an increasing risk of all-cause mortality from phenogroups 1 to 3 (p < 0.001). Phenogroup membership significantly improved survival prediction compared to conventional factors. Phenogroups were not predictive of hospitalisation for HF. A combination of ten variables assigned patients to phenogroups with 90% accuracy.
Conclusions: Applying unsupervised machine learning to routinely collected electronic health record data identified phenogroups with distinct clinical characteristics and unique survival profiles.
Methods: 2,187 patients admitted to five UK hospitals with a diagnosis of HF and a left ventricular ejection fraction > 40% were identified from the NIHR Health Informatics Collaborative database. Partition-based, model-based, and density-based machine learning clustering techniques were applied. Cox Proportional Hazards and Fine-Gray competing risks models were used to compare outcomes (all-cause mortality and hospitalisation for HF) across phenogroups. Penalised multinomial logistic regression was applied to predict phenogroup membership.
Results: Three phenogroups were identified: 1. Younger, predominantly female patients with high prevalence of cardiometabolic and coronary disease; 2. More frail patients, with higher rates of lung disease and atrial fibrillation; 3. Patients characterised by systemic inflammation and high rates of diabetes and renal dysfunction. Survival profiles were distinct, with an increasing risk of all-cause mortality from phenogroups 1 to 3 (p < 0.001). Phenogroup membership significantly improved survival prediction compared to conventional factors. Phenogroups were not predictive of hospitalisation for HF. A combination of ten variables assigned patients to phenogroups with 90% accuracy.
Conclusions: Applying unsupervised machine learning to routinely collected electronic health record data identified phenogroups with distinct clinical characteristics and unique survival profiles.
Original language | English |
---|---|
Article number | 343 |
Journal | BMC cardiovascular disorders |
Volume | 24 |
DOIs | |
Publication status | Accepted/In press - 19 Jun 2024 |
Keywords
- Heart failure with preserved or mildly reduced ejection fraction
- machine learning, electronic health records.