Corpus-based Wikipedia Studies: Theoretical and methodological challenges for translation scholars

Activity: Talk or presentationInvited talk


Abstract: Corpus-based methodologies are now well established in Translation Studies as approaches which offer a valuable means of examining large bodies of text in a more systematic and efficient manner than would be possible by hand and eye alone (Bernardini and Kenny 2020). The potential of such methodologies for the study of translation in Wikipedia, however, has yet to be sufficiently explored. In this presentation, I will argue that the construction and analysis of large corpora of Wikipedia content can not only dramatically improve our ability to
engage with the scale of the encyclopedia platform as a data source, but also open up several exciting new avenues for research. Specifically, through investigation of a corpus of Wikipedia Talk pages developed as part of a UK research council-funded project, I will show how these resources can be used to provide insight into the features and functions of ‘translation talk’ (Davier 2019; Pritzker 2014) and other forms of metalinguistic discussion as they appear across
many thousands of individual comments and interactions between members of the multilingual volunteer community. Discussion of this case study will also highlight a number of challenges shaping the application of corpus-based methodologies in a Wikipedia context, and encourage critical reflection on the limits and limitations of this approach to the analysis of Wikipedia translation.
Period15 Dec 2021
Event titleUnderstanding Wikipedia’s Dark Matter
Event typeConference
LocationHong Kong, Hong KongShow on map
Degree of RecognitionInternational


  • Wikipedia
  • translation
  • corpus-based methodologies