Discriminative Hand-Object Pose Estimation From Depth Images Using Convolutional Neural Networks

  • Duncan Goudie

Student thesis: Phd


This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released.
Date of Award31 Dec 2018
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorToby Howard (Supervisor) & Aphrodite Galata (Supervisor)


  • particle swarm optimisation
  • depth images
  • semantic segmentation
  • pose estimation
  • random forests
  • optimisation
  • machine learning
  • computer vision
  • convolutional neural networks

Cite this