Corpus-based dialectometry: A methodological sketch

Benedikt Szmrecsanyi

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, I introduce methodologies to tap corpora for exploring aggregate linguistic distances between dialects or varieties as a function of properties of geographic space. The paper describes the different steps necessary to obtain an appropriate corpus-based dataset (a so-called 'distance matrix'), and subsequently discusses several cartographic visualisation techniques - network maps, continuum maps and cluster maps - to project aggregate linguistic relationships to geography. In addition, the paper sketches some statistical methods to quantify these relationships. By way of example, a case study draws on the Freiburg Corpus of English Dialects - a major dialect corpus in which more than thirty traditional dialects of English from all over Great Britain are sampled. With a focus on regional variation in morphosyntax and on the basis of text frequencies of several dozen features, the study probes joint linguistic variability between the dialects sampled in the corpus. © Edinburgh University Press.
Original languageEnglish
Pages (from-to)45-76
Number of pages31
JournalCorpora
Volume6
Issue number1
DOIs
Publication statusPublished - May 2011

Fingerprint

Dive into the research topics of 'Corpus-based dialectometry: A methodological sketch'. Together they form a unique fingerprint.

Cite this