Distantly Supervised Named Entity Recognition with Category-Oriented Confidence Calibration

Liangping Ding, Tian Yuan Huang, Huan Liu, Yufei Wang, Zhixiong Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Named entity recognition plays an important role in extracting valuable information from digital libraries, which can help stakeholders to take full advantage of large quantities of documents to boost the development of scholarly knowledge discovery. Nevertheless, there aren’t many annotated NER datasets aiming at scientific literature except medical domain, restricting to utilize abundant of advanced deep learning models. As an alternative solution, distant supervision provides a feasible way to eliminate the need of human annotations by automatically generating annotated datasets based on external resources such as knowledge base, while introducing noise inevitably. In this work, we study the noisy-labeled named entity recognition under distant supervision setting. Considering that most NER systems based on confidence estimation deal with noisy labels ignoring the fact that model has different levels of confidence towards different categories, we propose a Category-oriented confidence calibration (Coca) strategy with an automatically confidence threshold calculation module. We integrate our method into a teacher-student self-training framework to improve the model performance. Our proposed approach achieves promising performance among advanced baseline models and can be easily integrated into other confidence based model frameworks (Our code is publicly available at: https://github.com/possible1402/BOND_Coca ).

Original languageEnglish
Title of host publicationFrom Born-Physical to Born-Virtual
Subtitle of host publicationAugmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings
EditorsYuen-Hsien Tseng, Marie Katsurai, Hoa N. Nguyen
PublisherSpringer Nature
Pages46-55
Number of pages10
ISBN (Print)9783031217555
DOIs
Publication statusPublished - 2022
Event24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022 - Hanoi, Viet Nam
Duration: 30 Nov 20222 Dec 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13636 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022
Country/TerritoryViet Nam
CityHanoi
Period30/11/222/12/22

Keywords

  • Digital library
  • Distant supervision
  • Named entity recognition
  • Pretrained language model
  • Self-training

Research Beacons, Institutes and Platforms

  • Manchester Institute of Innovation Research

Fingerprint

Dive into the research topics of 'Distantly Supervised Named Entity Recognition with Category-Oriented Confidence Calibration'. Together they form a unique fingerprint.

Cite this