Skip to main navigation Skip to search Skip to main content

MphayaNER: Named Entity Recognition for Tshivenda

  • Rendani Mbuvha
  • , David I. Adelani
  • , Tendani Mutavhatsindi
  • , Tshimangadzo Rakhuhu
  • , Aluwani Mauda
  • , Tshifhiwa Joshua Maumela
  • , Andisani Masindi
  • , Seani Rananga
  • , Vukosi Marivate
  • , Tshilidzi Marwala

Research output: Preprint/Working paperPreprint

43 Downloads (Pure)

Abstract

Named Entity Recognition (NER) plays a vital role in various Natural Language Processing tasks such as information retrieval, text classification, and question answering. However, NER can be challenging, especially in low-resource languages with limited annotated datasets and tools. This paper adds to the effort of addressing these challenges by introducing MphayaNER, the first Tshivenda NER corpus in the news domain. We establish NER baselines by \textit{fine-tuning} state-of-the-art models on MphayaNER. The study also explores zero-shot transfer between Tshivenda and other related Bantu languages, with chiShona and Kiswahili showing the best results. Augmenting MphayaNER with chiShona data was also found to improve model performance significantly. Both MphayaNER and the baseline models are made publicly available.
Original languageEnglish
PublisherarXiv
Pages1-5
Number of pages5
DOIs
Publication statusPublished - 8 Apr 2023

Keywords

  • cs.CL
  • cs.AI

Fingerprint

Dive into the research topics of 'MphayaNER: Named Entity Recognition for Tshivenda'. Together they form a unique fingerprint.

Cite this