Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation

In this paper we tackle the problem of estimating the 3D pose of object instances, using convolutional neural networks. State of the art methods usually solve the challenging problem of regression in angle space indirectly, focusing on learning discriminative features that are later fed into a separate architecture for 3D pose estimation. In contrast, we propose an end-to-end learning framework for directly regressing object poses by exploiting Siamese Networks. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset.

Contributions

We present Siamese Regression Network which, to the best of our knowledge, is the first CNN-based framework for regressing object poses in angle space.
We boost the performance of our system by introducing a novel loss function for feature-guided pose regression.
In turn, we show that pose-guided feature learning results in more discriminative features than the ones of [1] and are, experimentally proven, optimized for the particular task of 3D object pose estimation.
We show how our loss function can be adapted to deal with severe occlusions and evaluate our system on a new challenging dataset containing an object captured under severe occlusions. Furthermore, experimental evaluation on a benchmark dataset [2] provide evidence of our system outperforming the state of the art.

Results

Downloads

Paper

Dataset We used the one available in [2]. Our new hand / object dataset will be available soon!

References

[1] "Learning Descriptors for Object Recognition and 3D Pose Estimation", P. Wohlhart and V. Lepetit, CVPR 2015 – paper

[2] "Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes", S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. R. Bradski, K. Konolige, N. Navab, ACCV 2012 – paper