Topic Modelling vs Distant Supervision: A Comparative Evaluation based on the Classification of Parliamentary Enquiries

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

133 Downloads (Pure)

Abstract

We investigate two types of approaches to text classification, in the way of enriching categorising Parliamentary enquiries recorded by the UK House of Commons Library. One is an unsupervised approach, i.e., topic modelling, and the other is a supervised approach based on weakly labelled data, i.e., distant supervision. Models were trained on two types of feature sets: one based only on bag of words, and the other combining bag of words with structured metadata attached to enquiries. Our results show that topic modelling obtains superior performance on this task, and that the incorporation of structured metadata as learning features contributes insignificantly to improved model performance.
Original languageEnglish
Title of host publicationDigital Libraries for Open Knowledge
PublisherSpringer Nature
Volume11799
ISBN (Electronic)978-3-030-30760-8
ISBN (Print)978-3-030-30759-2
DOIs
Publication statusPublished - 2019

Publication series

NameLecture Notes in Computer Science

Fingerprint

Dive into the research topics of 'Topic Modelling vs Distant Supervision: A Comparative Evaluation based on the Classification of Parliamentary Enquiries'. Together they form a unique fingerprint.

Cite this