Topic Detection Using Paragraph Vectors to Support Active Learning in Systematic Reviews

Kazuma Hashimoto, Georgios Kontonatsios, Makoto Miwa, Sophia Ananiadou

Research output: Contribution to journalArticlepeer-review

Abstract

Systematic reviews require expert reviewers to manually screen thousands
of citations in order to identify all relevant articles to the review. Active
learning text classifcation is a supervised machine learning approach that
has been shown to signifcantly reduce the manual annotation workload by
semi-automating the citation screening process of systematic reviews. In this
paper, we present a new topic detection method that induces an informative
representation of studies, to improve the performance of the underlying active
learner. Our proposed topic detection method uses a neural network-based
vector space model to capture semantic similarities between documents. We
frstly represent documents within the vector space, and cluster the docu-
ments into a predefned number of clusters. The centroids of the clusters are
treated as latent topics. We then represent each document as a mixture of la-
tent topics. For evaluation purposes, we employ the active learning strategy
using both our novel topic detection method and a baseline topic model (i.e.,
Latent Dirichlet Allocation). Results obtained demonstrate that our method
is able to achieve a high sensitivity of eligible studies and a signifcantly re-
duced manual annotation cost when compared to the baseline method. This
observation is consistent across two clinical and three public health reviews.
Original languageEnglish
Pages (from-to)59–65
JournalJournal of Biomedical Informatics
Volume62
Early online date9 Jun 2016
DOIs
Publication statusPublished - Aug 2016

Fingerprint

Dive into the research topics of 'Topic Detection Using Paragraph Vectors to Support Active Learning in Systematic Reviews'. Together they form a unique fingerprint.

Cite this