Abstract
Purpose
To compare the more complex technique, functional principal component analysis (FPCA), to simpler methods of estimating values of sparse and irregularly spaced continuous variables at given time points in longitudinal data using a diabetic patient cohort from UK primary care.
Methods
The setting for this study is the Clinical Practice Research Datalink (CPRD), a UK general practice research database. For 16,034 diabetic patients identified in CPRD, with at least two measures in a 30-month period, HbA1c was estimated after temporarily omitting: i) the final and ii) middle known values using linear interpolation, simple linear regression, arithmetic mean, random effects and FPCA. Performance of each method was assessed using mean prediction error. The influence on predictive accuracy of 1) more homogeneous populations and 2) number and range of known HbA1c values was explored.
Results
When estimating the last observation, the predictive accuracy of FPCA was highest with over half of predicted values within 0.4 units, equivalent to laboratory measurement error. Predictive accuracy improved when estimating the middle observation with almost 60% predicted values within 0.4 units for FPCA. These results were marginally better than that achieved by simpler approaches, such as last-occurrence-carried-forward (LOCF) linear interpolation. This pattern persisted with more homogeneous populations as well as when variability in HbA1c measures coupled with frequency of data points were considered.
Conclusions
When estimating change from baseline to pre-specified time points in electronic medical records data, a marginal benefit to using the more complex modelling approach of FPCA exists over more traditional methods.
To compare the more complex technique, functional principal component analysis (FPCA), to simpler methods of estimating values of sparse and irregularly spaced continuous variables at given time points in longitudinal data using a diabetic patient cohort from UK primary care.
Methods
The setting for this study is the Clinical Practice Research Datalink (CPRD), a UK general practice research database. For 16,034 diabetic patients identified in CPRD, with at least two measures in a 30-month period, HbA1c was estimated after temporarily omitting: i) the final and ii) middle known values using linear interpolation, simple linear regression, arithmetic mean, random effects and FPCA. Performance of each method was assessed using mean prediction error. The influence on predictive accuracy of 1) more homogeneous populations and 2) number and range of known HbA1c values was explored.
Results
When estimating the last observation, the predictive accuracy of FPCA was highest with over half of predicted values within 0.4 units, equivalent to laboratory measurement error. Predictive accuracy improved when estimating the middle observation with almost 60% predicted values within 0.4 units for FPCA. These results were marginally better than that achieved by simpler approaches, such as last-occurrence-carried-forward (LOCF) linear interpolation. This pattern persisted with more homogeneous populations as well as when variability in HbA1c measures coupled with frequency of data points were considered.
Conclusions
When estimating change from baseline to pre-specified time points in electronic medical records data, a marginal benefit to using the more complex modelling approach of FPCA exists over more traditional methods.
Original language | English |
---|---|
Pages (from-to) | 1474-1482 |
Journal | Pharmacoepidemiology and Drug Safety |
Volume | 26 |
Issue number | 12 |
Early online date | 15 Aug 2017 |
DOIs | |
Publication status | Published - 2017 |
Keywords
- sparse longitudinal data
- linear interpolation
- Functional principal component analysis
- mean prediction error
- continuous variable
- predictive accuracy