Imagination-augmented Deep Reinforcement Learning for Robotic Applications

  • Mohammad Thabet

Student thesis: Phd


Deep reinforcement learning (RL) has recently emerged as a powerful technique that allows agents to solve complex sequential tasks. Its application in the field of robotics however has been held back by the impracticality of the enormous amount of interaction data it requires. Collecting data with a physical robot is usually prohibitively costly, prompting the need for more sample-efficient reinforcement learning algorithms. This thesis aims to develop an architecture that drastically improves the sample efficiency of RL and leads to faster learning with less data. The architecture incorporates a mechanism that mimics imagination in humans into model-free learning, allowing agents to simulate scenarios internally to lessen the need for actual interaction with the environment. In this model-assisted setting, an agent learns a stochastic environment model on-line from experience simultaneously with a policy. A variational autoencoder (VAE) is used to compress visual input into abstract representations in the latent space, and a mixture density network (MDN) is used to learn a forward model in this latent space. The agent then uses the learned model to generate imaginary rollouts to augment real data, which is then used to train a controller in an RL context. Uncertainty in the model predictions is estimated using Monte-Carlo dropout to limit the use of imaginary data, preventing the agent from using erroneous model predictions in learning. The thesis presents experiments involving human-robot interaction scenarios in an RL setting to verify the viability of the approach. The first experiment serves as a proof of concept and involved an agent learning a pick-and-place task based on gestures by a human. The second experiment was designed to demonstrate the advantages of the approach and involved a robot learning to solve a puzzle based on gestures. Results show that the proposed imagination-augmented agents perform significantly better than baseline agents when data is scarce, proving the efficacy of the approach in increasing sample efficiency.
Date of Award1 Aug 2022
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorJonathan Shapiro (Supervisor) & Angelo Cangelosi (Supervisor)


  • Artificial intelligence
  • deep learning
  • reinforcement learning
  • robotics

Cite this