GenCompareSum: a hybrid unsupervised summarization method using salience

Jennifer Bishop, Qianqian Xie, Sophia Ananiadou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text summarization (TS) is an important NLP task. Pre-trained Language Models (PLMs) have been used to improve the performance of TS. However, PLMs are limited by their need of labelled training data and by their attention mechanism, which often makes them unsuitable for use on long documents. To this end, we propose a hybrid, unsupervised, abstractive-extractive approach, in which we walk through a document, generating salient textual fragments representing its key points. We then select the most important sentences of the document by choosing the most similar sentences to the generated texts, calculated using BERTScore. We evaluate the efficacy of generating and using salient textual fragments to guide extractive summarization on documents from the biomedical and general scientific domains. We compare the performance between long and short documents using different generative text models, which are finetuned to generate relevant queries or document titles. We show that our hybrid approach out-performs existing unsupervised methods, as well as state-of-the-art supervised methods, despite not needing a vast amount of labelled training data.
Original languageEnglish
Title of host publicationProceedings of the 21st Workshop on Biomedical Language Processing
PublisherAssociation for Computational Linguistics
Pages220
Number of pages240
Publication statusPublished - 1 May 2022

Fingerprint

Dive into the research topics of 'GenCompareSum: a hybrid unsupervised summarization method using salience'. Together they form a unique fingerprint.

Cite this