Distributed Document and Phrase Co-embeddings for Descriptive Clustering

Motoki Sato, Austin Brockmeier, Georgios Kontonatsios, Tingting Mu, John Goulermas, Junichi Tsujii, Sophia Ananiadou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co- embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.
Original languageEnglish
Title of host publicationProceedings of EACL 2017
Number of pages11
Publication statusPublished - Jan 2017
EventEuropean Chapter of the Association for Computational Linguistics - Valencia Conference Center, Valencia, Spain
Duration: 3 Apr 20177 Apr 2017
Conference number: 15


ConferenceEuropean Chapter of the Association for Computational Linguistics
Abbreviated titleEACL
Internet address


  • descriptive clustering
  • co-embeddings
  • text mining
  • systematic reviews


Dive into the research topics of 'Distributed Document and Phrase Co-embeddings for Descriptive Clustering'. Together they form a unique fingerprint.

Cite this