Partial Annotation Learning for Biomedical Entity Recognition

Liangping Ding*, Giovanni Colavizza, Zhixiong Zhang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

79 Downloads (Pure)

Abstract

Named Entity Recognition (NER) is a key task to support biomedical research. In Biomedical Named Entity Recognition (BioNER), obtaining high-quality expert annotated data is laborious and expensive, leading to the development of automatic approaches such as distant supervision. However, manually and automatically generated data often suffer from the unlabeled entity problem, whereby many entity annotations are missing, degrading the performance of full annotation NER models. To conquer this issue, we undertake a systematic exploration of the efficacy of partial annotation learning methods for BioNER, which encompasses a comprehensive evaluation conducted across a spectrum of distinct simulated scenarios of missing entity annotations. Furthermore, we propose a TS-PubMedBERT-Partial-CRF partial annotation learning model. We standardize a compilation of 16 BioNER corpora, encompassing a range of five distinct entity types, to establish a gold standard. And we compare against the state of-the-art partial annotation model EER-PubMedBERT, the widely acknowledged partial annotation model BiLSTM Partial-CRF model, and the state-of-the-art full annotation learning BioNER model PubMedBERT tagger. Results show that partial annotation learning-based methods can effectively learn from biomedical corpora with missing entity annotations. Our proposed model outperforms alternatives and, specifically, the PubMedBERT tagger by 38% in F1-score under high missing entity rates. Moreover, the recall of entity mentions in our model demonstrates a competitive alignment with the upper threshold observed on the fully annotated dataset. We have published our data, source code and training records at https://github.com/possible1402/partial_annotation_learning.
Original languageEnglish
Pages (from-to)1 - 10
JournalIEEE Journal of Biomedical and Health Informatics
DOIs
Publication statusPublished - 23 Sept 2024

Keywords

  • Biomedical Named Entity Recognition
  • Conditional Random Field
  • Partial Annotation Learning
  • Pre-trained Language Model

Research Beacons, Institutes and Platforms

  • Manchester Institute of Innovation Research

Fingerprint

Dive into the research topics of 'Partial Annotation Learning for Biomedical Entity Recognition'. Together they form a unique fingerprint.

Cite this