Digital Divides in the Era of Big Data: New Dimensions and Measurements

  • Luis Rivera Illingworth

Student thesis: Phd


The various digital divides have evolved over time and been measured from their earliest days with 'traditional' data sources. These data sources face challenges of data availability, completeness, accuracy, relevance, timeliness and accessibility; thus creating challenges for existing measures of digital divides and the policies that rely on such measures. The aim of this thesis is to investigate the intersection between digital divides and big data, to understand how big data can be used to measure digital divides. Following a systematic literature review (SLR) on big data and digital divides, this study develops a conceptual framework operationalised through a social data science methodology (SDSM) to conduct research using big data to measure digital divides. The SLR explores how big data impacts the way we conceptualise and measure divides. The results confirm the relevance of the topic and reveal the use of big data for measuring traditional and big data-related divides. Two main ways in which big data are used to measure divides are identified: as a complement or as a sole source. Research gaps which are helpful for defining an initial research agenda on measuring divides using big data are also identified. The conceptual framework and the SDSM are used to conduct two empirical cases. The first case analyses broadband divides using aggregated crowdsourced big data, based on an online survey of connectivity speeds from many millions of users. The results of the analyses confirm the existence of broadband divides between high-, middle- and low-income countries. Big data provides a more complete and accurate picture of broadband divides between countries, alongside other benefits. Big data also offers new insights into divides between fixed and mobile networks as well as between download and upload speeds. Big data is also used to calculate two broadband indices and to measure readiness for broadband. The second case analyses knowledge divides using billions of Wikipedia records generated as a by-product of the consumption of contents. The results expose knowledge disparities between countries based on their income and region; with Wikipedia data being more available, accurate in some ways, timely, and accessible compared to traditional data. In addition, these data allow the identification of detailed consumption volumes, per capita estimates, device preferences, and the analysis of language divides that cannot be measured with traditional sources. Combined with Wikipedia production data, a new index is created, helping provide additional insights into knowledge divides. Overall, the results of this study show that big data is changing the way we conceptualise and measure digital divides. The empirical cases demonstrate that big data can be used as a complement of traditional measures with some advantages when compared to traditional data. Big data offers new insights into digital divides that have not been measured due to the limitations of traditional data. Additional information that can be used to inform policy decisions --such as the estimation of indices-- can be obtained thanks to the additional data points included in these big data sources. Nevertheless, the limitations of big data are also acknowledged thus helping analysts decide on the use of these type of sources when measuring digital divides and for policy making. A Big Data Usage Decision Tool is presented to be used as all or part of ex-ante evaluation tools.
Date of Award1 Aug 2022
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorRichard Heeks (Supervisor) & Jaco Renken (Supervisor)


  • big data divides
  • data intensive development
  • social data science
  • big data
  • digital divides
  • digital development

Cite this