Skip to main navigation Skip to search Skip to main content

Compressing Context to Enhance Inference Efficiency of Large Language Models

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM’s fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance.
Original languageEnglish
Title of host publicationEMNLP2023: The 2023 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationProceedings of the Conference
EditorsH. Bouamor , J. Pino, K. Bali
PublisherAssociation for Computational Linguistics
Pages6342–6353
Number of pages12
ISBN (Print)9788891760608
DOIs
Publication statusPublished - Dec 2023
Event2023 Conference on Empirical Methods in Natural Language Processing - Singapore, Singapore
Duration: 6 Dec 202310 Dec 2023

Conference

Conference2023 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2023
Country/TerritorySingapore
CitySingapore
Period6/12/2310/12/23

Fingerprint

Dive into the research topics of 'Compressing Context to Enhance Inference Efficiency of Large Language Models'. Together they form a unique fingerprint.

Cite this