Projects per year
Abstract
Systematic reviews require expert reviewers to manually screen thousands
of citations in order to identify all relevant articles to the review. Active
learning text classifcation is a supervised machine learning approach that
has been shown to signifcantly reduce the manual annotation workload by
semi-automating the citation screening process of systematic reviews. In this
paper, we present a new topic detection method that induces an informative
representation of studies, to improve the performance of the underlying active
learner. Our proposed topic detection method uses a neural network-based
vector space model to capture semantic similarities between documents. We
frstly represent documents within the vector space, and cluster the docu-
ments into a predefned number of clusters. The centroids of the clusters are
treated as latent topics. We then represent each document as a mixture of la-
tent topics. For evaluation purposes, we employ the active learning strategy
using both our novel topic detection method and a baseline topic model (i.e.,
Latent Dirichlet Allocation). Results obtained demonstrate that our method
is able to achieve a high sensitivity of eligible studies and a signifcantly re-
duced manual annotation cost when compared to the baseline method. This
observation is consistent across two clinical and three public health reviews.
of citations in order to identify all relevant articles to the review. Active
learning text classifcation is a supervised machine learning approach that
has been shown to signifcantly reduce the manual annotation workload by
semi-automating the citation screening process of systematic reviews. In this
paper, we present a new topic detection method that induces an informative
representation of studies, to improve the performance of the underlying active
learner. Our proposed topic detection method uses a neural network-based
vector space model to capture semantic similarities between documents. We
frstly represent documents within the vector space, and cluster the docu-
ments into a predefned number of clusters. The centroids of the clusters are
treated as latent topics. We then represent each document as a mixture of la-
tent topics. For evaluation purposes, we employ the active learning strategy
using both our novel topic detection method and a baseline topic model (i.e.,
Latent Dirichlet Allocation). Results obtained demonstrate that our method
is able to achieve a high sensitivity of eligible studies and a signifcantly re-
duced manual annotation cost when compared to the baseline method. This
observation is consistent across two clinical and three public health reviews.
Original language | English |
---|---|
Pages (from-to) | 59–65 |
Journal | Journal of Biomedical Informatics |
Volume | 62 |
Early online date | 9 Jun 2016 |
DOIs | |
Publication status | Published - Aug 2016 |
Fingerprint
Dive into the research topics of 'Topic Detection Using Paragraph Vectors to Support Active Learning in Systematic Reviews'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Supporting Evidence Based Public Health Interventions using Text Mining
Ananiadou, S. (PI) & Mcnaught, J. (CoI)
31/03/14 → 31/03/17
Project: Research
Impacts
-
Saving Time and Costs for Evidence-based Public Health Interventions: Text Mining Tool RobotAnalyst
Ananiadou, S. (Participant), Mcnaught, J. (Participant), Goulermas, J. (Participant), (Participant) & (Participant)
Impact: Health and wellbeing, Economic, Technological