A population-based study exploring phenotypic clusters and clinical outcomes in stroke using unsupervised machine learning approach

Ralph K Akyea, George Ntaios, Evangelos Kontopantelis, Georgios Georgiopoulos, Daniele Soria, Folkert W Asselbergs, Joe Kai, Stephen F Weng, Nadeem Qureshi

Research output: Contribution to journalArticlepeer-review


Individuals developing stroke have varying clinical characteristics, demographic, and biochemical profiles. This heterogeneity in phenotypic characteristics can impact on cardiovascular disease (CVD) morbidity and mortality outcomes. This study uses a novel clustering approach to stratify individuals with incident stroke into phenotypic clusters and evaluates the differential burden of recurrent stroke and other cardiovascular outcomes. We used linked clinical data from primary care, hospitalisations, and death records in the UK. A data-driven clustering analysis (kamila algorithm) was used in 48,114 patients aged ≥ 18 years with incident stroke, from 1-Jan-1998 to 31-Dec-2017 and no prior history of serious vascular events. Cox proportional hazards regression was used to estimate hazard ratios (HRs) for subsequent adverse outcomes, for each of the generated clusters. Adverse outcomes included coronary heart disease (CHD), recurrent stroke, peripheral vascular disease (PVD), heart failure, CVD-related and all-cause mortality. Four distinct phenotypes with varying underlying clinical characteristics were identified in patients with incident stroke. Compared with cluster 1 (n = 5,201, 10.8%), the risk of composite recurrent stroke and CVD-related mortality was higher in the other 3 clusters (cluster 2 [n = 18,655, 38.8%]: hazard ratio [HR], 1.07; 95% CI, 1.02-1.12; cluster 3 [n = 10,244, 21.3%]: HR, 1.20; 95% CI, 1.14-1.26; and cluster 4 [n = 14,014, 29.1%]: HR, 1.44; 95% CI: 1.37-1.50). Similar trends in risk were observed for composite recurrent stroke and all-cause mortality outcome, and subsequent recurrent stroke outcome. However, results were not consistent for subsequent risk in CHD, PVD, heart failure, CVD-related mortality, and all-cause mortality. In this proof of principle study, we demonstrated how a heterogenous population of patients with incident stroke can be stratified into four relatively homogenous phenotypes with differential risk of recurrent and major cardiovascular outcomes. This offers an opportunity to revisit the stratification of care for patients with incident stroke to improve patient outcomes.

Original languageEnglish
Pages (from-to)e0000334
JournalPL o S Digital Health
Issue number9
Publication statusPublished - 13 Sept 2023


Dive into the research topics of 'A population-based study exploring phenotypic clusters and clinical outcomes in stroke using unsupervised machine learning approach'. Together they form a unique fingerprint.

Cite this