Predicting risk of dementia with machine learning and survival models using routine primary care records

John Langham, Daniel Stamate, Charlotte Wu, Fionn Murtagh, Catharine Morgan, David Reeves, Darren Ashcroft, Evan Kontopantelis, Brian McMillan

Research output: Contribution to conferencePaperpeer-review


Abstract—Worldwide, it is forecasted that 131.5 million people will suffer from dementia by 2050, and the annual cost of care will increase from 818 billion USD in 2016 to 2 trillion USD by 2030, with burgeoning social consequences. Given a timely prediction of a dementia outcome in patients, appropriate mitigating interventions can be applied to reduce risk. However such prediction facilities need to be made available to wider populations, and these facilities cannot rely on specialised, costly and invasive testing (such as neuroimaging, cerebrospinal fluid collection, etc which constitute important instruments used in diagnosis), for interventions to have a meaningful quantitative impact. Hence an emerging need exists for the wider application of prognostic measures which can be deployed using lower cost data sources such as longitudinal records routinely collected by general practices. This paper proposes an efficient prediction modelling approach to the risk of dementia, using CPRD data collected from GP practices in UK, and based on machine learning in particular the Gradient Boosting Machines model combined with a survival model such as the Cox Proportional Hazard, encapsulated in a semi-supervised learning and model calibration methodology.
Original languageEnglish
Number of pages3042
Publication statusPublished - 9 Dec 2021
EventIEEE International Conference on Bioinformatics and Biomedicine - Virtual Conference, Houston, United States
Duration: 9 Dec 202112 Dec 2021


ConferenceIEEE International Conference on Bioinformatics and Biomedicine
Abbreviated titleIEEE BIBM
Country/TerritoryUnited States
Internet address


  • dementia risk
  • CPRD
  • primary care
  • prediction modelling
  • machine learning
  • classification
  • gradient boosting machines
  • Cox proportional hazards
  • model calibration


Dive into the research topics of 'Predicting risk of dementia with machine learning and survival models using routine primary care records'. Together they form a unique fingerprint.

Cite this