Projects per year
Abstract
Background
When developing a clinical prediction model using time-to-event data (i.e., with censoring and different lengths of follow-up), previous research focuses on the sample size needed to minimise overfitting and precisely estimating the overall risk. However, instability of individual-level risk estimates may still be large.
Methods
We propose using a decomposition of Fisher’s information matrix to help examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used either before data collection or when an existing dataset is available. Steps (1) to (5) require researchers to specify the overall risk in the target population at a key time-point of interest; an assumed pragmatic ‘core model’ in the form of an exponential regression model; the (anticipated) joint distribution of core predictors included in that model; and the distribution of any censoring . The ‘core model’ can be specified directly or based on a specified C-index and relative effects of (standardised) predictors. The joint distribution of predictors may be available directly in an existing dataset, in a pilot study, or in a synthetic dataset provided by other researchers.
Results
We derive closed-form solutions that decompose the variance of an individual’s estimated event rate into Fisher’s unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including any risk thresholds for decision making, and examine fairness concerns for pre- and post-menopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our exponential approach are close to using more flexible parametric models.
Conclusions
Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.
When developing a clinical prediction model using time-to-event data (i.e., with censoring and different lengths of follow-up), previous research focuses on the sample size needed to minimise overfitting and precisely estimating the overall risk. However, instability of individual-level risk estimates may still be large.
Methods
We propose using a decomposition of Fisher’s information matrix to help examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used either before data collection or when an existing dataset is available. Steps (1) to (5) require researchers to specify the overall risk in the target population at a key time-point of interest; an assumed pragmatic ‘core model’ in the form of an exponential regression model; the (anticipated) joint distribution of core predictors included in that model; and the distribution of any censoring . The ‘core model’ can be specified directly or based on a specified C-index and relative effects of (standardised) predictors. The joint distribution of predictors may be available directly in an existing dataset, in a pilot study, or in a synthetic dataset provided by other researchers.
Results
We derive closed-form solutions that decompose the variance of an individual’s estimated event rate into Fisher’s unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including any risk thresholds for decision making, and examine fairness concerns for pre- and post-menopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our exponential approach are close to using more flexible parametric models.
Conclusions
Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.
| Original language | English |
|---|---|
| Journal | Diagnostic and Prognostic Research |
| Publication status | Accepted/In press - 30 Jun 2025 |
Fingerprint
Dive into the research topics of 'A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models - Part 2: time-to-event outcomes'. Together they form a unique fingerprint.Projects
- 1 Finished
-
HOD2: Toward Holistic Approaches to Clinical Prediction of Multi-Morbidity: A Dynamic Synergy of Inter-Connected Risk Models.
Martin, G. (PI), Peek, N. (CoI), Sergeant, J. (CoI) & Van Staa, T. (CoI)
1/05/20 → 30/04/23
Project: Research