Understanding, Predicting and Mitigating Web Survey Breakoffs

  • Zeming Chen

Student thesis: Phd


Web survey respondents quit survey partway through more frequently than in other survey modes. This pre-mature quitting event is called survey breakoff. It causes missing data, reduces sample size, lowers statistical power, and sometimes biases survey estimates. Using a number of experiments, statistical models and simulations, this thesis contributes to the understanding, prediction and mitigation of web survey breakoffs. It tackles breakoffs from three stages of the survey data collection: before, during and after the survey. Chapter 4 focuses on the survey design stage by randomly allocating survey respondents to one of the filter question formats and one of the six orders of the question topics. It shows that presenting all filter questions before showing any follow-ups (i.e. grouped filter question format) postpones the breakoff, compared to presenting them by pairs (interleafed format). However, as respondents answer more questions, the breakoff rate in the grouped format quickly catches up with that in the interleafed format. Additionally, when introducing upcoming new topics, more breakoffs are expected. Meanwhile, the insurance-related topic has more breakoffs than the topics about clothing purchase and utilities payment. Chapter 5 predicts breakoff during the survey using seven statistical models (traditional and LASSO Cox, traditional and LASSO logistic regression, Support Vector Machine, Random forest, and Gradient boosting) and four types of predictors: (1) respondents’ demographics, (2) time-varying variables (whose values change by questions) coded concurrently, (3) time-varying variables coded cumulatively, and (4) the three previous predictors together. The gradient boosting produces the best performance for breakoff prediction while the Cox survival model does not improve the prediction further although it accounts for the clustered structure in the breakoff data (questions clustered within respondents). Also, time-varying variables are best used concurrently to improve the prediction of breakoff. Chapter 6 investigates different strategies for adjusting for breakoff after the survey. Four methods are applied to the simulated data where four breakoff rates and three breakoff mechanisms are manipulated, and their ability to mitigate the breakoff bias is compared. It is found that multiple imputation outperforms the other three methods employed in the study and the cause of breakoff is more influential on the effectiveness of compensation methods compared to the breakoff rate.
Date of Award1 Aug 2023
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorNatalie Shlomo (Supervisor) & Alexandru Cernat (Supervisor)

Cite this