Abstract
Knowledge is fundamental to societal development and individual wellbeing, with digital technologies becoming crucial for its production and consumption. This study explores the use of Wikipedia log files as a big data source to measure knowledge divides, focusing on knowledge consumption. Using the Wikipedia API to extract billions of pageviews, this research measures consumption volumes, per capita estimates, and technological divides, broken down by income level and regions. It also addresses language divides in knowledge consumption by combining Wikipedia data with traditional language data. In addition a knowledge divides index is estimated using a Wikipedia consumption indicator and an existing production indicator.
This study analyses Wikipedia big data qualities using a conceptual framework. Compared to traditional datasets, Wikipedia data offers increased geographical availability, cost-effectiveness, improved accuracy in some aspects, timeliness, and accessibility. However, it is less complete for measuring sociodemographic dimensions and reflects Western knowledge representation, with potential biases in content production and consumption profiles.
The research identifies implications for development and policy, suggesting Wikipedia data should complement other measures in multidimensional knowledge indicators. These can help policymakers track countries' progress and identify underrepresented languages. Despite some limitations Wikipedia big data proves valuable for measuring knowledge divides when used alongside traditional sources, offering insights into disparities previously difficult to measure.
This study analyses Wikipedia big data qualities using a conceptual framework. Compared to traditional datasets, Wikipedia data offers increased geographical availability, cost-effectiveness, improved accuracy in some aspects, timeliness, and accessibility. However, it is less complete for measuring sociodemographic dimensions and reflects Western knowledge representation, with potential biases in content production and consumption profiles.
The research identifies implications for development and policy, suggesting Wikipedia data should complement other measures in multidimensional knowledge indicators. These can help policymakers track countries' progress and identify underrepresented languages. Despite some limitations Wikipedia big data proves valuable for measuring knowledge divides when used alongside traditional sources, offering insights into disparities previously difficult to measure.
Original language | English |
---|---|
Place of Publication | Manchester |
Publication status | Published - 2025 |
Publication series
Name | GDI Digital Development Working Papers |
---|---|
Publisher | Centre for Digital Development |
No. | 112 |
Research Beacons, Institutes and Platforms
- Global inequalities