A Bootstrapped Chinese Biomedical Named Entity Recognition Model Incorporating Lexicons

Liangping Ding, Zhixiong Zhang*, Huan Liu

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Biomedical named entity recognition (BioNER) is a sub-task of named entity recognition, aiming at recognizing named entities in medical text to boost the knowledge discovery. In this paper, we propose a bootstrapped model incorporating lexicons, which takes advantage of pretrained language model, semi-supervised learning and external lexicon features to apply BioNER to Chinese medical abstracts. Extensive evaluation shows that our system is competitive on limited annotated training data, which surpasses the baselines including HMM, CRF, BiLSTM, BiLSTM-CRF and BERT for 54.60%, 37.92%, 55.46%, 48.67%, 7.99% respectively. The experimental results demonstrate that unsupervised pretraining makes pretrained language model acquire the ability that only a few annotated data can achieve great performance for downstream tasks. In addition, semi-supervised learning and external lexicon features can further compensate for the problem of insufficient annotated data.

Original languageEnglish
Pages (from-to)19-28
Number of pages10
JournalCEUR Workshop Proceedings
Volume3210
Publication statusPublished - 2022
Event3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2022 - Virtual, Online, Germany
Duration: 23 Jun 202224 Jun 2022

Keywords

  • Biomedical Named Entity Recognition
  • Bootstrapping
  • Feature Incorporation
  • Pretrained Language Model
  • Semi-Supervised Learning

Research Beacons, Institutes and Platforms

  • Manchester Institute of Innovation Research

Fingerprint

Dive into the research topics of 'A Bootstrapped Chinese Biomedical Named Entity Recognition Model Incorporating Lexicons'. Together they form a unique fingerprint.

Cite this