A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models - Part 2: time-to-event outcomes

Richard D. Riley, Gary S. Collins, Lucinda Archer, Rebecca Whittle, Amardeep Legha, Laura Kirton, Paula Dhiman, Mohsen Sadatsafavi, Nicola J. Adderley, Joseph Alderman , Glen P. Martin, Joie Ensor

Research output: Contribution to journalArticlepeer-review

Abstract

Background
When developing a clinical prediction model using time-to-event data (i.e., with censoring and different lengths of follow-up), previous research focuses on the sample size needed to minimise overfitting and precisely estimating the overall risk. However, instability of individual-level risk estimates may still be large.

Methods
We propose using a decomposition of Fisher’s information matrix to help examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used either before data collection or when an existing dataset is available. Steps (1) to (5) require researchers to specify the overall risk in the target population at a key time-point of interest; an assumed pragmatic ‘core model’ in the form of an exponential regression model; the (anticipated) joint distribution of core predictors included in that model; and the distribution of any censoring . The ‘core model’ can be specified directly or based on a specified C-index and relative effects of (standardised) predictors. The joint distribution of predictors may be available directly in an existing dataset, in a pilot study, or in a synthetic dataset provided by other researchers.

Results
We derive closed-form solutions that decompose the variance of an individual’s estimated event rate into Fisher’s unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including any risk thresholds for decision making, and examine fairness concerns for pre- and post-menopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our exponential approach are close to using more flexible parametric models.

Conclusions
Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.
Original languageEnglish
JournalDiagnostic and Prognostic Research
Publication statusAccepted/In press - 30 Jun 2025

Fingerprint

Dive into the research topics of 'A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models - Part 2: time-to-event outcomes'. Together they form a unique fingerprint.

Cite this