TY - JOUR
T1 - Understanding the Predictors of Missing Location Data to Inform Smartphone Study Design: Observational Study
AU - Beukenhorst, Anna L
AU - Sergeant, Jamie C
AU - Schultz, David M
AU - McBeth, John
AU - Yimer, Belay B
AU - Dixon, Will G
N1 - Funding Information:
We thank Sabine van der Veer for providing feedback on this manuscript. ALB is supported by a Medical Research Council Doctoral Training Partnership grant (MR/N013751/1). Cloudy with a Chance of Pain was supported by Versus Arthritis (grant reference 21225). This research was further supported by the Centre for Epidemiology Versus Arthritis (grant 21755) and the National Institute for Health Research Manchester Biomedical Research Centre.
Publisher Copyright:
© Anna L Beukenhorst, Jamie C Sergeant, David M Schultz, John McBeth, Belay B Yimer, Will G Dixon.
PY - 2021/11/16
Y1 - 2021/11/16
N2 - Background: Smartphone location data can be used for observational health studies (to determine participant exposure or behavior) or to deliver a location-based health intervention. However, missing location data are more common when using smartphones compared to when using research-grade location trackers. Missing location data can affect study validity and intervention safety. Objective: The objective of this study was to investigate the distribution of missing location data and its predictors to inform design, analysis, and interpretation of future smartphone (observational and interventional) studies. Methods: We analyzed hourly smartphone location data collected from 9665 research participants on 488,400 participant days in a national smartphone study investigating the association between weather conditions and chronic pain in the United Kingdom. We used a generalized mixed-effects linear model with logistic regression to identify whether a successfully recorded geolocation was associated with the time of day, participants’ time in study, operating system, time since previous survey completion, participant age, sex, and weather sensitivity. Results: For most participants, the app collected a median of 2 out of a maximum of 24 locations (1760/9665, 18.2% of participants), no location data (1664/9665, 17.2%), or complete location data (1575/9665, 16.3%). The median locations per day differed by the operating system: participants with an Android phone most often had complete data (a median of 24/24 locations) whereas iPhone users most often had a median of 2 out of 24 locations. The odds of a successfully recorded location for Android phones were 22.91 times higher than those for iPhones (95% CI 19.53-26.87). The odds of a successfully recorded location were lower during weekends (odds ratio [OR] 0.94, 95% CI 0.94-0.95) and nights (OR 0.37, 95% CI 0.37-0.38), if time in study was longer (OR 0.99 per additional day in study, 95% CI 0.99-1.00), and if a participant had not used the app recently (OR 0.96 per additional day since last survey entry, 95% CI 0.96-0.96). Participant age and sex did not predict missing location data. Conclusions: The predictors of missing location data reported in our study could inform app settings and user instructions for future smartphone (observational and interventional) studies. These predictors have implications for analysis methods to deal with missing location data, such as imputation of missing values or case-only analysis. Health studies using smartphones for data collection should assess context-specific consequences of high missing data, especially among iPhone users, during the night and for disengaged participants.
AB - Background: Smartphone location data can be used for observational health studies (to determine participant exposure or behavior) or to deliver a location-based health intervention. However, missing location data are more common when using smartphones compared to when using research-grade location trackers. Missing location data can affect study validity and intervention safety. Objective: The objective of this study was to investigate the distribution of missing location data and its predictors to inform design, analysis, and interpretation of future smartphone (observational and interventional) studies. Methods: We analyzed hourly smartphone location data collected from 9665 research participants on 488,400 participant days in a national smartphone study investigating the association between weather conditions and chronic pain in the United Kingdom. We used a generalized mixed-effects linear model with logistic regression to identify whether a successfully recorded geolocation was associated with the time of day, participants’ time in study, operating system, time since previous survey completion, participant age, sex, and weather sensitivity. Results: For most participants, the app collected a median of 2 out of a maximum of 24 locations (1760/9665, 18.2% of participants), no location data (1664/9665, 17.2%), or complete location data (1575/9665, 16.3%). The median locations per day differed by the operating system: participants with an Android phone most often had complete data (a median of 24/24 locations) whereas iPhone users most often had a median of 2 out of 24 locations. The odds of a successfully recorded location for Android phones were 22.91 times higher than those for iPhones (95% CI 19.53-26.87). The odds of a successfully recorded location were lower during weekends (odds ratio [OR] 0.94, 95% CI 0.94-0.95) and nights (OR 0.37, 95% CI 0.37-0.38), if time in study was longer (OR 0.99 per additional day in study, 95% CI 0.99-1.00), and if a participant had not used the app recently (OR 0.96 per additional day since last survey entry, 95% CI 0.96-0.96). Participant age and sex did not predict missing location data. Conclusions: The predictors of missing location data reported in our study could inform app settings and user instructions for future smartphone (observational and interventional) studies. These predictors have implications for analysis methods to deal with missing location data, such as imputation of missing values or case-only analysis. Health studies using smartphones for data collection should assess context-specific consequences of high missing data, especially among iPhone users, during the night and for disengaged participants.
KW - Data analysis
KW - Digital epidemiology
KW - Environmental exposures
KW - Geolocation
KW - Global positioning system
KW - Location data
KW - Missing data
KW - Mobile application
KW - Mobile health
KW - Mobile phone
KW - Smartphones
UR - https://doi.org/10.2196/28857
U2 - 10.2196/28857
DO - 10.2196/28857
M3 - Article
C2 - 34783661
SN - 2291-5222
VL - 9
JO - JMIR mHealth and uHealth
JF - JMIR mHealth and uHealth
IS - 11
M1 - e28857
ER -