TY - UNPB
T1 - Evaluating pay-for-performance programs in health care: a comparison of synthetic control and difference-in-differences approaches
AU - Kreif, N
AU - Grieve, R
AU - Hangartner, D
AU - Nikolova, S
AU - Turner, A
AU - Sutton, S
PY - 2014
Y1 - 2014
N2 - Policy-makers worldwide are introducing pay-for-performance (P4P) schemes in health care without adequate evaluation. In 2008 the Advancing Quality (AQ) initiative, based on the US Hospital Quality Incentive Demonstration, was introduced for all hospitals in the north-west region of England. Published evaluations of the AQ program have used difference-in-differences (DiD) regression to compare 30-day risk-adjusted hospital mortality 18 months before and after the program’s introduction in the North West and the rest of England for patients admitted with three of the incentivised conditions: pneumonia, heart failure and acute myocardial infarction. They concluded that the AQ program led to a significant reduction in risk-adjusted mortality for patients admitted with pneumonia, and was cost-effective. However, this approach assumed that, without the AQ scheme, the two groups of hospitals would have followed parallel trends in risk-adjusted mortality. We contrast DiD regression with the synthetic control method, developed by Abadie and colleagues, which generalises DiD by allowing heterogeneous responses over time to unobserved common factors. A synthetic control group is defined as a weighted average of control units, with the weights chosen to minimise differences in the changes over time in the pre-intervention outcomes and covariates between the comparison groups. This approach has not previously been considered in evaluations of P4P programs for health care providers. This setting requires extending the method, originally developed for evaluating treatment effects for a single aggregated treated unit, for data with multiple treated units (e.g hospital). This extension can inform policy makers on potentially heterogeneous effects of P4P on different types of hospitals. This paper estimates the effects of the AQ scheme on risk-adjusted mortality overall, and according to hospital type (teaching, large, medium, small), for patients admitted with pneumonia. For each hospital in the North West (n=23), we generated a synthetic control by weighting control hospitals (n=122), to balance the patterns of risk-adjusted mortality prior to the introduction of AQ. The algorithm also used information on pre-intervention covariates such as hospital quality, and aggregated-level case-mix variables. We estimated the effect of the AQ program by contrasting mortality between the north-west and synthetic control groups for the 18 month period after AQ was introduced. Uncertainty in the quality of the synthetic controls was assessed with placebo tests undertaken at regional and subgroup level. These tests compared the magnitude of the estimated treatment effects, with the corresponding effects estimated from applying the same procedure for control hospitals only. The synthetic control groups had similar pre-intervention trajectories for risk-adjusted mortality compared with the North West hospitals. The synthetic control approach reported that the effect of the AQ scheme on risk-adjusted mortality was small and not statistically significant, both overall ( -0.2, p=0.98), and for each subgroup. We conclude that the synthetic control method is an attractive approach for evaluating P4P schemes. By minimising differences between the comparison groups in pre-intervention outcomes, the synthetic control method provides estimates of program impacts that are more robust to time-varying unobservable heterogeneity between the intervention group and the potential controls, than traditional DiD approaches.
AB - Policy-makers worldwide are introducing pay-for-performance (P4P) schemes in health care without adequate evaluation. In 2008 the Advancing Quality (AQ) initiative, based on the US Hospital Quality Incentive Demonstration, was introduced for all hospitals in the north-west region of England. Published evaluations of the AQ program have used difference-in-differences (DiD) regression to compare 30-day risk-adjusted hospital mortality 18 months before and after the program’s introduction in the North West and the rest of England for patients admitted with three of the incentivised conditions: pneumonia, heart failure and acute myocardial infarction. They concluded that the AQ program led to a significant reduction in risk-adjusted mortality for patients admitted with pneumonia, and was cost-effective. However, this approach assumed that, without the AQ scheme, the two groups of hospitals would have followed parallel trends in risk-adjusted mortality. We contrast DiD regression with the synthetic control method, developed by Abadie and colleagues, which generalises DiD by allowing heterogeneous responses over time to unobserved common factors. A synthetic control group is defined as a weighted average of control units, with the weights chosen to minimise differences in the changes over time in the pre-intervention outcomes and covariates between the comparison groups. This approach has not previously been considered in evaluations of P4P programs for health care providers. This setting requires extending the method, originally developed for evaluating treatment effects for a single aggregated treated unit, for data with multiple treated units (e.g hospital). This extension can inform policy makers on potentially heterogeneous effects of P4P on different types of hospitals. This paper estimates the effects of the AQ scheme on risk-adjusted mortality overall, and according to hospital type (teaching, large, medium, small), for patients admitted with pneumonia. For each hospital in the North West (n=23), we generated a synthetic control by weighting control hospitals (n=122), to balance the patterns of risk-adjusted mortality prior to the introduction of AQ. The algorithm also used information on pre-intervention covariates such as hospital quality, and aggregated-level case-mix variables. We estimated the effect of the AQ program by contrasting mortality between the north-west and synthetic control groups for the 18 month period after AQ was introduced. Uncertainty in the quality of the synthetic controls was assessed with placebo tests undertaken at regional and subgroup level. These tests compared the magnitude of the estimated treatment effects, with the corresponding effects estimated from applying the same procedure for control hospitals only. The synthetic control groups had similar pre-intervention trajectories for risk-adjusted mortality compared with the North West hospitals. The synthetic control approach reported that the effect of the AQ scheme on risk-adjusted mortality was small and not statistically significant, both overall ( -0.2, p=0.98), and for each subgroup. We conclude that the synthetic control method is an attractive approach for evaluating P4P schemes. By minimising differences between the comparison groups in pre-intervention outcomes, the synthetic control method provides estimates of program impacts that are more robust to time-varying unobservable heterogeneity between the intervention group and the potential controls, than traditional DiD approaches.
M3 - Working paper
BT - Evaluating pay-for-performance programs in health care: a comparison of synthetic control and difference-in-differences approaches
ER -