As artificial intelligence (AI) becomes increasingly embedded in complex, real-world settings, enabling intelligent agents to seamlessly adapt and interact with a diverse array of mixed agents has emerged as a pivotal research challenge. Mixed agents encompass a broad spectrum of mind and physical variations, differing in their incentives, strategies, and embodiments—ranging from humans and AI-driven systems to heterogeneous robots. Against this backdrop, this thesis advances the field of mixed-agent decision-making by introducing comprehensive theoretical frameworks, practical algorithms, and rigorous evaluation methodologies designed to support effective collaboration and competition with previously unseen co-players in the training phase.
Through a game-theoretic lens, this thesis addresses three fundamental interaction structures—fully cooperative, fully competitive, and mixed-motive—offering new insights into adapting decision making in mixed-agent systems. For fully cooperative games, we identify the cooperative incompatibility (CI) problem in zero-shot coordination (ZSC) and introduce graphic-form game and preference graphic-form game to more efficiently model and assess CI. Building on these foundations, we present the Cooperative Open-ended LEarning (COLE) framework, along with practical algorithms, to resolve the CI issue. To validate the effectiveness of this approach, we conduct experiments not only with previously unseen AI partners but also through a user-friendly online human-AI experimentation platform, collecting data from over a hundred participants. The results show that our methods outperform state-of-the-art baselines, improving coordination with both unseen AI and human players. For mixed-motive games, where agents must navigate both cooperative and competitive incentives, we introduce the Altruistic Gradient Adjustment (AgA) algorithm to align individual and collective objectives. By embedding agent strategies and objectives into a differentiable framework—termed a differential mixed-motive game—AgA utilizes higher-order gradient computations to precisely reconcile these competing interests. We theoretically prove that AgA guides agents toward stable fixed points favoring collective goals while also enabling them to escape unstable fixed points. Comprehensive empirical evaluations, ranging from small-scale public good games to our custom large-scale StarCraft II scenarios, support these theoretical claims and demonstrate AgA’s effectiveness in more complex and scalable mixed-motive environments. For fully competitive games, we analyze and address non-transitivity—where strategies form cyclical relationships—in Xiangqi (Chinese Chess). Non-transitivity not only reduces training efficiency but also results in failures against unexpected opponents, even when the learning has converged. We begin with an empirical analysis of 10,000 real-world human gameplay records, verifying both the spinning-top hypothesis and the severe non-transitivity present in Xiangqi. To overcome this challenge, we introduce the JiangJun algorithm, which extends AlphaZero-based method into an open-ended learning framework capable of iteratively mitigating non-transitivity. Evaluations conducted through a WeChat mini program demonstrate that JiangJun achieves a Master-level performance, attaining a 99.41% win rate against human players. Additional metrics, including relative population performance and visual analyses, further confirm the algorithm’s effectiveness in resolving non-transitivity.
Building upon these theoretical and algorithmic foundations, this thesis progresses to real-world mixed-robot applications. While recent ZSC methods have primarily focused on two-player video games such as OverCooked!2 and Hanabi, we advance the field by defining and introducing a real-world multi-player ZSC challenge: the zero-shot multi-drone cooperative pursuit task. Theoretically, we extend the two-player graphic-form game concepts to multi-drone scenarios by integrating hypergraph representations and introducing HOLA-Drone, a hypergraphic open-ended learning algorithm that dynamically adjusts learning objectives to improve cooperation among diverse drone teammates. Extensive evaluations against previously unseen drone partners confirm HOLA-Drone’s superior performance, and real-world experiments further demonstrate its feasibility in practical, physical settings.
| Date of Award | 4 Mar 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The University of Manchester
|
|---|
| Supervisor | Wei Pan (Main Supervisor) & Angelo Cangelosi (Co Supervisor) |
|---|
- Multi-agent system
- multi-agent reinforcement learning
- game theory
- multi-robot system
Adaptive Decision-Making in Mixed-Agent Systems
Li, Y. (Author). 4 Mar 2025
Student thesis: Phd