Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models

Abstract

Object pose estimation from a single view remains a challenging problem. In particular, partial observability, occlusions, and object symmetries eventually result in pose ambiguity. To account for this multimodality, this work proposes training a diffusion-based generative model for 6D object pose estimation. During inference, the trained generative model allows for sampling multiple particles, i.e., pose hypotheses. To distill this information into a single pose estimate, we propose two novel and effective pose selection strategies that do not require any additional training or computationally intensive operations. Moreover, while many existing methods for pose estimation primarily focus on the image domain and only incorporate depth information for final pose refinement, our model solely operates on point cloud data. The model leverages recent advancements in point cloud processing through an SE(3)-equivariant latent space that forms the basis for the selection strategies and improved inference times. Experimental results demonstrate the effectiveness of our design choices and competitive performance on the Linemod dataset.

Publication
In CVPR 2025 Workshop on Event-based Vision
Niklas Funk
Niklas Funk
PhD Student in Computer Science

My research interests include robotics, reinforcement learning and dexterous manipulation.