Abstract
Objectives: No clear guidance exists on handling missing data at each stage of developing, validating and implementing a clinical prediction model (CPM). We aimed to review the approaches to handling missing data that underly the CPMs currently recommended for use in UK healthcare.
Study Design and Setting: A descriptive cross-sectional meta-epidemiological study aiming to identify CPMs recommended by the National Institute for Health and Care Excellence (NICE), which summarized how missing data is handled across their pipelines.
Results: A total of 23 CPMs were included through “sampling strategy.” Six missing data strategies were identified: complete case analysis (CCA), multiple imputation, imputation of mean values, k-nearest neighbours imputation, using an additional category for missingness, considering missing values as risk-factor-absent. 52% of the development articles and 48% of the validation articles did not report how missing data were handled. CCA was the most common approach used for development (40%) and validation (44%). At implementation, 57% of the CPMs required complete data entry, whilst 43% allowed missing values. Three CPMs had consistent paths in their pipelines.
Conclusion: A broad variety of methods for handling missing data underly the CPMs currently recommended for use in UK healthcare. Missing data handling strategies were generally inconsistent. Better quality assurance of CPMs needs greater clarity and consistency in handling of missing data.
Study Design and Setting: A descriptive cross-sectional meta-epidemiological study aiming to identify CPMs recommended by the National Institute for Health and Care Excellence (NICE), which summarized how missing data is handled across their pipelines.
Results: A total of 23 CPMs were included through “sampling strategy.” Six missing data strategies were identified: complete case analysis (CCA), multiple imputation, imputation of mean values, k-nearest neighbours imputation, using an additional category for missingness, considering missing values as risk-factor-absent. 52% of the development articles and 48% of the validation articles did not report how missing data were handled. CCA was the most common approach used for development (40%) and validation (44%). At implementation, 57% of the CPMs required complete data entry, whilst 43% allowed missing values. Three CPMs had consistent paths in their pipelines.
Conclusion: A broad variety of methods for handling missing data underly the CPMs currently recommended for use in UK healthcare. Missing data handling strategies were generally inconsistent. Better quality assurance of CPMs needs greater clarity and consistency in handling of missing data.
Original language | English |
---|---|
Pages (from-to) | 149-158 |
Number of pages | 10 |
Journal | Journal of Clinical Epidemiology |
Volume | 140 |
Early online date | 11 Sept 2021 |
DOIs | |
Publication status | Published - Dec 2021 |
Keywords
- Imputation
- Missing data
- Missing data handling approaches
- Predictive medicine
- Prognosis
- Statistical models