InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

NeurIPS 2024（2024）

引用 0|浏览38

摘要

Text-conditioned human motion generation has experienced significantadvancements with diffusion models trained on extensive motion capture data andcorresponding textual annotations. However, extending such success to 3Ddynamic human-object interaction (HOI) generation faces notable challenges,primarily due to the lack of large-scale interaction data and comprehensivedescriptions that align with these interactions. This paper takes theinitiative and showcases the potential of generating human-object interactionswithout direct training on text-interaction pair data. Our key insight inachieving this is that interaction semantics and dynamics can be decoupled.Being unable to learn interaction semantics through supervised training, weinstead leverage pre-trained large models, synergizing knowledge from a largelanguage model and a text-to-motion model. While such knowledge offershigh-level control over interaction semantics, it cannot grasp the intricaciesof low-level interaction dynamics. To overcome this issue, we further introducea world model designed to comprehend simple physics, modeling how human actionsinfluence object motion. By integrating these components, our novel framework,InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shotmanner. We apply InterDreamer to the BEHAVE and CHAIRS datasets, and ourcomprehensive experimental analysis demonstrates its capability to generaterealistic and coherent interaction sequences that seamlessly align with thetext directives.

查看译文

关键词

human object interaction,human motion generation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

您的评分 :

暂无评分

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn