Knowledge Divides in the Era of Big Data: Using Wikipedia Data to Identify and Measure Divides across Countries and Languages

Alfonso Rivera-Illingworth*, Richard Heeks, Jaco Renken

*Corresponding author for this work

Research output: Preprint/Working paperWorking paper

Abstract

Knowledge is fundamental to societal development and individual wellbeing, with digital technologies becoming crucial for its production and consumption. This study explores the use of Wikipedia log files as a big data source to measure knowledge divides, focusing on knowledge consumption. Using the Wikipedia API to extract billions of pageviews, this research measures consumption volumes, per capita estimates, and technological divides, broken down by income level and regions. It also addresses language divides in knowledge consumption by combining Wikipedia data with traditional language data. In addition a knowledge divides index is estimated using a Wikipedia consumption indicator and an existing production indicator.

This study analyses Wikipedia big data qualities using a conceptual framework. Compared to traditional datasets, Wikipedia data offers increased geographical availability, cost-effectiveness, improved accuracy in some aspects, timeliness, and accessibility. However, it is less complete for measuring sociodemographic dimensions and reflects Western knowledge representation, with potential biases in content production and consumption profiles.

The research identifies implications for development and policy, suggesting Wikipedia data should complement other measures in multidimensional knowledge indicators. These can help policymakers track countries' progress and identify underrepresented languages. Despite some limitations Wikipedia big data proves valuable for measuring knowledge divides when used alongside traditional sources, offering insights into disparities previously difficult to measure.
Original languageEnglish
Place of PublicationManchester
Publication statusPublished - 2025

Publication series

NameGDI Digital Development Working Papers
PublisherCentre for Digital Development
No.112

Research Beacons, Institutes and Platforms

  • Global inequalities

Fingerprint

Dive into the research topics of 'Knowledge Divides in the Era of Big Data: Using Wikipedia Data to Identify and Measure Divides across Countries and Languages'. Together they form a unique fingerprint.

Cite this