Zero-Shot Learning of Human-Object Interactions through Common-sense Knowledge

  • Alessio Sarullo

Student thesis: Phd


Recognising Human-Object Interactions is a fundamental step for an automated system to understand human-centric images. However, the number of interactions to be recognised increases multiplicatively in the number of actions and objects, which makes exhaustive labelling of datasets extremely expensive. Zero-Shot Learning provides a solution to this problem by allowing models to make predictions about classes that lack training instances (they are unseen). Inspired by works in Psychology and Neurobiology, in this thesis we focus on actions, addressing a challenging and understudied zero-shot setting that involves interactions with unseen actions as opposed to just unseen combinations of seen actions and objects. We propose two ways of characterising actions relying on common-sense knowledge, i.e., information that is available to the system prior to training. The first approach is based on affordances: action-object relations that specify which actions can be performed on a given object. In the second approach we consider body part information, identifying dependencies between actions and body part states and defining actions according to such dependencies. Both characterisations are distilled into the model by carefully designed training objectives, and we empirically show that they are effective and complementary to each other for the task of Zero-Shot Human-Object Interaction Recognition.
Date of Award1 Aug 2021
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorGavin Brown (Supervisor) & Tingting Mu (Supervisor)


  • Human-Object Interactions
  • Zero-Shot Learning
  • Visual Relationship Detection

Cite this