An evaluation of the bootstrap for model validation in mixture models

Thomas Jaki, Ting Li Su, Minjung Kim, M. Lee Van Horn

Research output: Contribution to journalArticlepeer-review

Abstract

Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models. The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.

Original languageEnglish
Pages (from-to)1-11
Number of pages11
JournalCommunications in Statistics: Simulation and Computation
Early online date10 Mar 2017
DOIs
Publication statusPublished - 2017

Keywords

  • Finite mixture models
  • Leave-k-out cross-validation
  • Model validation
  • Nonparametric Bootstrap
  • Regression mixture models

Fingerprint

Dive into the research topics of 'An evaluation of the bootstrap for model validation in mixture models'. Together they form a unique fingerprint.

Cite this