Evaluating the usefulness of embedding phonetic representations into an authorship analysis-based framework for the comparison of spoken data

James Tompkinson, Andrea Nini

Research output: Contribution to conferenceAbstractpeer-review

Abstract

“Higher-order” linguistic features such as lexis, grammar and morphology are frequently analysed in forensic speaker comparison (FSC) cases (Gold and French, 2011), but assessment of these features arguably lacks a formalised comparison framework. Furthermore, the question of how the assessment of “higher-order” features could be combined with other phonetic features in FSC assessments is yet to be comprehensively addressed.

Using transcribed speech data from the West Yorkshire Regional English Database (Gold et al., 2018), we applied two well-known authorship analysis methods using the likelihood ratio framework: Cosine Delta (Ishihara 2021) and Phi n-gram tracing (Nini 2023) to assess the speaker discriminatory value of frequent words (Sergidou et al., 2023) and word-level n-grams. We also applied the method to transcripts where vocalised hesitation markers had been coded with phonetic information to assess the potential for “higher-order” linguistic features to be combined with segmental phonetic analysis to achieve greater speaker discriminatory power.

Findings support previous research that methods used to discriminate between authors can be usefully applied to transcribed speech data. We also find that these methods can be enhanced by embedding segmental phonetic information within transcripts. For Delta, the use of hesitation markers performs as well as function words. For Phi n-gram tracing, the analysis of only n-grams containing only hesitation markers is superior to a classic word n-grams analysis.

References

Gold, E. (2020). WYRED - West Yorkshire Regional English Database 2016-2019. [data collection]. UK Data Service. SN: 854354, DOI: 10.5255/UKDA-SN-854354

Ishihara, Shunichi. 2021. Score-based likelihood ratios for linguistic text evidence with a bag-of-words model. Forensic Science International. Elsevier 327. 110980.

Nini, A. (2023). A Theory of Linguistic Individuality for Authorship Analysis. Elements in Forensic Linguistics. Cambridge University Press.

Sergidou, E. K., Scheijen, N., Leegwater, J., Cambier-Langeveld, T., & Bosma, W. (2023). Frequent-words analysis for forensic speaker comparison. Speech Communication, 150, 1-8.
Original languageEnglish
DOIs
Publication statusPublished - 26 Jun 2024
Event5th European Conference of the IAFLL – International Association for Forensic and Legal Linguists - Aston University, Birmingham, United Kingdom
Duration: 24 Jun 202427 Jun 2024
Conference number: 5
https://www.aston.ac.uk/research/forensic-linguistics/iafll-regional-conference

Conference

Conference5th European Conference of the IAFLL – International Association for Forensic and Legal Linguists
Country/TerritoryUnited Kingdom
CityBirmingham
Period24/06/2427/06/24
Internet address

Fingerprint

Dive into the research topics of 'Evaluating the usefulness of embedding phonetic representations into an authorship analysis-based framework for the comparison of spoken data'. Together they form a unique fingerprint.
  • IAFPA Research Grant

    Tompkinson, J. (Recipient) & Nini, A. (Recipient), 28 Feb 2024

    Prize: Prize (including medals and awards)

Cite this