Abstract
Biomedical named entity recognition (BioNER) is a sub-task of named entity recognition, aiming at recognizing named entities in medical text to boost the knowledge discovery. In this paper, we propose a bootstrapped model incorporating lexicons, which takes advantage of pretrained language model, semi-supervised learning and external lexicon features to apply BioNER to Chinese medical abstracts. Extensive evaluation shows that our system is competitive on limited annotated training data, which surpasses the baselines including HMM, CRF, BiLSTM, BiLSTM-CRF and BERT for 54.60%, 37.92%, 55.46%, 48.67%, 7.99% respectively. The experimental results demonstrate that unsupervised pretraining makes pretrained language model acquire the ability that only a few annotated data can achieve great performance for downstream tasks. In addition, semi-supervised learning and external lexicon features can further compensate for the problem of insufficient annotated data.
Original language | English |
---|---|
Pages (from-to) | 19-28 |
Number of pages | 10 |
Journal | CEUR Workshop Proceedings |
Volume | 3210 |
Publication status | Published - 2022 |
Event | 3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2022 - Virtual, Online, Germany Duration: 23 Jun 2022 → 24 Jun 2022 |
Keywords
- Biomedical Named Entity Recognition
- Bootstrapping
- Feature Incorporation
- Pretrained Language Model
- Semi-Supervised Learning
Research Beacons, Institutes and Platforms
- Manchester Institute of Innovation Research