HanDeSeT: Hansard Debates with Sentiment Tags

Dataset

Description

A corpus of Hansard UK Parliament Debates for use in the evaluation of sentiment analysis systems.
The corpus consists of 1251 motion-speech units taken from 129 separate debates from the UK House of Commons 1997-2017.

Each unit comprises a parliamentary speech of up to five utterances and an associated debate motion. Debates comprise between one and 30 speeches, and speeches range in length from 31 to 1049 words, with a mean of 167.8 words. The debates cover a two decade period from 1997 to 2017 and a wide range of topics from domestic and foreign affairs to procedural matters concerning the running of the House.

Each motion has two sentiment polarity labels:
1. A manually applied sentiment polarity label ; and
2. A label derived from the relationship of the MP who proses the motion to the Government.

Each speech has two sentiment polarity labels:
1. A speaker-vote label extracted from the division associated with the corresponding debate; and:
2. A manually assigned label.

In addition, the following metadata is included with each unit: debate id, speaker party affiliation, motion party affiliation, speaker name, and speaker rebellion rate.

Manually applied motion labels are approximately evenly balanced; the other labels are slightly skewed towards the positive class.

Hansard transcript data is used under the Open Parliament Licence V3.0.
Data regarding speaker rebellion rates is taken from the Public Whip, and used under the Open Data Commons Open Database License (ODbL).
Date made available22 Feb 2018
PublisherMendeley Data

Keywords

  • Artificial Intelligence
  • Political Science
  • Computational Linguistics
  • Data Science
  • Natural Language Processing

Cite this