Abstract
Accurate automated identification of named places is a major concern for
scholars in the digital humanities, and especially for those engaged in
research that depends upon the gazetteer-led recognition of specific
aspects. The field of onomastics examines the linguistic roots and
historical development of names, which have for the most part only
standardized into single officially recognized forms since the late
nineteenth century. Even slight spelling variations can introduce errors
in geotagging techniques, and these differences in place-name spellings
are thus vital considerations when seeking high rates of correct
geospatial identification in historical texts. This article offers an
overview of typical name-based variation that can cause issues in the
accurate geotagging of any historical resource. The article argues that
careful study and documentation of these variations can assist in the
development of more complete onymic records, which in turn may inform
geo-taggers through a cycle of variational recognition. It demonstrates
how patterns in regional naming variation and development, across both
specific and generic name elements, can be identified through the
historical records of each known location. The article uses examples
taken from a digitized corpus of writing about the English Lake
District, a collection of 80 texts that date from between 1622 and 1900.
Four of the more complex spelling-based problems encountered during the
creation of a manual gazetteer for this corpus are examined.
Specifically, the article demonstrates how and why such variation must
be expected, particularly in the years preceding the standardization of
place-name spellings. It suggests how procedural developments may be
undertaken to account for such geo-referential issues in the Named
Entity Recognition (NER) strategies employed by future projects.
Similarly, the benefits of such multigenre corpora to assist in
completing onomastic records is also shown via examples of new name
forms discovered for prominent sites in the Lake District. This focus is
accompanied by a discussion of the influence of literary works on
place-name standardization-an aspect not typically accounted for in
traditional onomastic study-to illustrate the extent to which authorial
interests in regional toponymic histories can influence linguistic
development.
Original language | English |
---|---|
Pages (from-to) | 58-81 |
Number of pages | 24 |
Journal | Journal of Map and Geography Libraries |
Volume | 13 |
Issue number | 1 |
Early online date | 11 May 2017 |
DOIs | |
Publication status | Published - 2017 |
Keywords
- GIS
- environmental analysis
- gazetteers
- historical linguistics
- language variation
- linguistic studies
- name studies