POPE: 6-Dof Promptable Pose Estimation of Any Object, in Any Scene, with One Reference.

Zhiwen Fan,Panwang Pan,Peihao Wang,Yifan Jiang,Dejia Xu,Hanwen Jiang,Zhangyang Wang大牛学者

Computer Vision and Pattern Recognition（2024）

引用 0|浏览61

摘要

Despite the significant progress in six degrees-of-freedom (6DoF) object pose estimation, existing methods have limited applicability in real-world scenarios involving embodied agents and downstream 3D vision tasks. These limitations mainly come from the necessity of 3D models, closed-category detection, and a large number of densely annotated support views. To mitigate this issue, we propose a general paradigm for object pose estimation, called Promptable Object Pose Estimation (POPE). The proposed approach POPE enables zero-shot 6DoF object pose estimation for any target object in any scene, while only a single reference is adopted as the support view. To achieve this, POPE leverages the power of the pre-trained large-scale 2D foundation model, employs a framework with hierarchical feature representation and 3D geometry principles. Moreover, it estimates the relative camera pose between object prompts and the target object in new views, enabling both two-view and multi-view 6DoF pose estimation tasks. Comprehensive experimental results demonstrate that POPE exhibits unrivaled robust performance in zero-shot settings, by achieving a significant reduction in the averaged Median Pose Error by 52.38% and 50.47% on the LINEMOD and OnePose datasets, respectively. We also conduct more challenging testings in causally captured images (see Figure 1), which further demonstrates the robustness of POPE. Project page can be found with https://paulpanwang.github.io/POPE/.

查看译文

关键词

Pose Estimation,Real-world Scenarios,Target Object,Camera Pose,Human Pose Estimation,Foundation Model,Relative Pose,Object Pose,Image Registration,Image Object,Large-scale Datasets,Point Cloud,Bounding Box,Target Image,Reference Image,Depth Map,Masked Images,Median Error,CAD Model,View Synthesis,Target View,Arbitrary Objects,Vision Transformer,Sparse Point Cloud,Scale Ambiguity,Point Xi,Translation Vector,Object Segmentation,Pose Estimation Methods

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

您的评分 :

暂无评分

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn