Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.

Original languageEnglish
Title of host publicationProceedings of the 36th AAAI Conference on Artificial Intelligence
Subtitle of host publicationAAAI-22 Technical Tracks 8
EditorsKatia Sycara, Vasant Honavar, Matthijs Spaan
Place of PublicationWashington, D.C.
PublisherAAAI Press
Pages8378-8385
Number of pages8
ISBN (Electronic)9781577358763
DOIs
Publication statusPublished - 28 Jun 2022

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAAAI Press
Number8
Volume36
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Keywords

  • machine learning (ML)

Fingerprint

Dive into the research topics of 'Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency'. Together they form a unique fingerprint.

Cite this