Abstract
Estimating the 3D pose of a hand interacting with an object is a challenging task, harder than hand-only pose estimation as the object can cause heavy occlusion on the hand. We present a two stage discriminative approach using convolutional neural networks (CNN). The first stage classifies and segments the object pixels from a depth image containing the hand and object. This processed image is used to aid the second stage in estimating hand-object pose as it contains information regarding the object location and object occlusion. To the best of our knowledge, this is the first attempt at discriminative one shot hand-object pose estimation. We show that this approach outperforms the current state-of-the-art and that the inclusion of a segmentation stage to learned discriminative single stage systems improves their performance.
Original language | English |
---|---|
Title of host publication | IEEE International Conference on Automatic Face & Gesture Recognition |
DOIs | |
Publication status | Published - 2017 |