Abstract
Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co- embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.
Original language | English |
---|---|
Title of host publication | Proceedings of EACL 2017 |
Pages | 991-1001 |
Number of pages | 11 |
Publication status | Published - Jan 2017 |
Event | European Chapter of the Association for Computational Linguistics - Valencia Conference Center, Valencia, Spain Duration: 3 Apr 2017 → 7 Apr 2017 Conference number: 15 http://eacl2017.org/ |
Conference
Conference | European Chapter of the Association for Computational Linguistics |
---|---|
Abbreviated title | EACL |
Country/Territory | Spain |
City | Valencia |
Period | 3/04/17 → 7/04/17 |
Internet address |
Keywords
- descriptive clustering
- co-embeddings
- text mining
- systematic reviews
Fingerprint
Dive into the research topics of 'Distributed Document and Phrase Co-embeddings for Descriptive Clustering'. Together they form a unique fingerprint.Impacts
-
Saving Time and Costs for Evidence-based Public Health Interventions: Text Mining Tool RobotAnalyst
Ananiadou, S. (Participant), Mcnaught, J. (Participant), Goulermas, J. (Participant), (Participant) & (Participant)
Impact: Health and wellbeing, Economic, Technological