Projects per year
Abstract
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 36th AAAI Conference on Artificial Intelligence |
| Subtitle of host publication | AAAI-22 Technical Tracks 8 |
| Editors | Katia Sycara, Vasant Honavar, Matthijs Spaan |
| Place of Publication | Washington, D.C. |
| Publisher | AAAI Press |
| Pages | 8378-8385 |
| Number of pages | 8 |
| ISBN (Electronic) | 9781577358763 |
| DOIs | |
| Publication status | Published - 28 Jun 2022 |
Publication series
| Name | Proceedings of the AAAI Conference on Artificial Intelligence |
|---|---|
| Publisher | AAAI Press |
| Number | 8 |
| Volume | 36 |
| ISSN (Print) | 2159-5399 |
| ISSN (Electronic) | 2374-3468 |
Keywords
- machine learning (ML)
Fingerprint
Dive into the research topics of 'Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency'. Together they form a unique fingerprint.Projects
- 1 Active
-
MCAIF: Centre for AI Fundamentals
Kaski, S. (PI), Alvarez, M. (Researcher), Pan, W. (Researcher), Mu, T. (Researcher), Rivasplata, O. (PI), Sun, M. (PI), Mukherjee, A. (PI), Caprio, M. (PI), Sonee, A. (Researcher), Leroy, A. (Researcher), Wang, J. (Researcher), Lee, J. (Researcher), Parakkal Unni, M. (Researcher), Sloman, S. (Researcher), Menary, S. (Researcher), Quilter, T. (Researcher), Hosseinzadeh, A. (PGR student), Mousa, A. (PGR student), Glover, E. (PGR student), Das, A. (PGR student), DURSUN, F. (PGR student), Zhu, H. (PGR student), Abdi, H. (PGR student), Dandago, K. (PGR student), Piriyajitakonkij, M. (PGR student), Rachman, R. (PGR student), Shi, X. (PGR student), Keany, T. (PGR student), Liu, X. (PGR student), Jiang, Y. (PGR student), Wan, Z. (PGR student), Harrison, M. (Support team), Hartford, J. (PI), Kangin, D. (Researcher), Harikumar, H. (PI), Dubey, M. (PI), Parakkal Unni, M. (PI), Dash, S. P. (PGR student), Mi, X. (PGR student), Barlas, Y. (PGR student), Osho, T. (Support team) & Tariq, M. (Support team)
1/10/21 → 30/09/26
Project: Research