Missing data was handled inconsistently in UK prediction models: a review of method used

Research output: Contribution to journalReview articlepeer-review

Abstract

Objectives: No clear guidance exists on handling missing data at each stage of developing, validating and implementing a clinical prediction model (CPM). We aimed to review the approaches to handling missing data that underly the CPMs currently recommended for use in UK healthcare.

Study Design and Setting: A descriptive cross-sectional meta-epidemiological study aiming to identify CPMs recommended by the National Institute for Health and Care Excellence (NICE), which summarized how missing data is handled across their pipelines.

Results: A total of 23 CPMs were included through “sampling strategy.” Six missing data strategies were identified: complete case analysis (CCA), multiple imputation, imputation of mean values, k-nearest neighbours imputation, using an additional category for missingness, considering missing values as risk-factor-absent. 52% of the development articles and 48% of the validation articles did not report how missing data were handled. CCA was the most common approach used for development (40%) and validation (44%). At implementation, 57% of the CPMs required complete data entry, whilst 43% allowed missing values. Three CPMs had consistent paths in their pipelines.

Conclusion: A broad variety of methods for handling missing data underly the CPMs currently recommended for use in UK healthcare. Missing data handling strategies were generally inconsistent. Better quality assurance of CPMs needs greater clarity and consistency in handling of missing data.
Original languageEnglish
Pages (from-to)149-158
Number of pages10
JournalJournal of Clinical Epidemiology
Volume140
Early online date11 Sept 2021
DOIs
Publication statusPublished - Dec 2021

Keywords

  • Imputation
  • Missing data
  • Missing data handling approaches
  • Predictive medicine
  • Prognosis
  • Statistical models

Fingerprint

Dive into the research topics of 'Missing data was handled inconsistently in UK prediction models: a review of method used'. Together they form a unique fingerprint.

Cite this