Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition

James O. Butler, Christopher E. Donaldson, Joanna E. Taylor, Ian N. Gregory

Research output: Contribution to journalArticlepeer-review

93 Downloads (Pure)

Abstract

Accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardized into single officially recognized forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geo-taggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitized corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardization of place-name spellings. It suggests how procedural developments may be undertaken to account for such geo-referential issues in the Named Entity Recognition (NER) strategies employed by future projects. Similarly, the benefits of such multigenre corpora to assist in completing onomastic records is also shown via examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardization-an aspect not typically accounted for in traditional onomastic study-to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development.
Original languageEnglish
Pages (from-to)58-81
Number of pages24
JournalJournal of Map and Geography Libraries
Volume13
Issue number1
Early online date11 May 2017
DOIs
Publication statusPublished - 2017

Keywords

  • GIS
  • environmental analysis
  • gazetteers
  • historical linguistics
  • language variation
  • linguistic studies
  • name studies

Fingerprint

Dive into the research topics of 'Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition'. Together they form a unique fingerprint.

Cite this