Identifying Author Heritage Using Surname Data: An Application for Russian Surnames

Research output: Contribution to journalArticlepeer-review


This research paper puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary-based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) users in 2015, we develop a surname-based identification method and apply it to infer Russian heritage from suffix-based morphological regularities. The method is developed conceptually and is tested in an under-sampled control set. Identification based on surname morphology is then complemented by using first-name data to eliminate false positive results. The method achieves 98% precision and 94% recall rates –superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be used to overcome long-standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organisations, regions and countries.
Original languageEnglish
JournalJournal of the Association for Information Science and Technology
Early online date25 Jan 2019
Publication statusPublished - 2019


  • Surname analysis
  • heritage
  • identification
  • ethnicity
  • Russia

Research Beacons, Institutes and Platforms

  • Manchester Institute of Innovation Research


Dive into the research topics of 'Identifying Author Heritage Using Surname Data: An Application for Russian Surnames'. Together they form a unique fingerprint.

Cite this