Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents

Sergio Pelaez, Gaurav Verma, Barbara Ribeiro, Philip Shapira

Research output: Contribution to journalArticlepeer-review

Abstract

We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and Trademark Office (USPTO), we design a semi-automated, human-supervised framework for identifying and labelling public value expressions in these sentences. A GPT-4 prompt is developed which includes definitions, guidelines, examples, and rationales for text classification. We evaluate the labels and rationales produced by GPT-4 using BLEU scores and topic modelling, finding that they are accurate, diverse, and faithful. GPT-4 achieved an advanced recognition of public value expressions from our framework, which it also uses to discover unseen public value expressions. The GPT-produced labels are used to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts. With careful framework design and interactive human oversight, we suggest that generative language models can offer significant assistance in producing labels and rationales.
Original languageEnglish
JournalQuantitative Science Studies
DOIs
Publication statusAccepted/In press - 21 Nov 2023

Keywords

  • Generative language models
  • text labelling
  • public value
  • AI patents
  • large-scale classification
  • GPT-4

Fingerprint

Dive into the research topics of 'Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents'. Together they form a unique fingerprint.

Cite this