Skip to main navigation Skip to search Skip to main content

Can LLMs Evaluate Complex Attribution in QA? Automatic Benchmarking using Knowledge Graphs

  • Nan Hu
  • , Jiaoyan Chen
  • , Yike Wu
  • , Guilin Qi
  • , Hongru Wang
  • , Sheng Bi
  • , Yongrui Chen
  • , Tongtong Wu
  • , Jeff Z. Pan
  • Southeast University
  • University of Edinburgh

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

41 Downloads (Pure)

Abstract

Attributed Question Answering (AQA) has attracted wide attention, but there are still several limitations in evaluating the attributions, including lacking fine-grained attribution categories, relying on manual annotations, and failing to compare attributions with only subtle differences. To bridge these gaps, we introduce Complex Attributed Question Answering (CAQA), a large-scale benchmark containing comprehensive attribution categories, automatically generated using Knowledge Graphs (KGs), and complex attribution scenarios. We have conducted extensive experiments to verify the effectiveness of CAQA, including the benchmarking of 25 automatic evaluators, their comparison with
human evaluators, the testing of LLM evaluators fine-tuned by CAQA and so on. These experiments also lead to a series of important findings that can benefit the future research of AQA. All the codes and data are publicly accessible at https://anonymous.4open.science/r/CAQA-Benchmark-C2C7/.
Original languageEnglish
Title of host publicationThe 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
DOIs
Publication statusPublished - 1 Jul 2025

Keywords

  • Large Language Model
  • Attributed Question Answering
  • Knowledge Graph

Fingerprint

Dive into the research topics of 'Can LLMs Evaluate Complex Attribution in QA? Automatic Benchmarking using Knowledge Graphs'. Together they form a unique fingerprint.

Cite this