Crossway Diffusion: Improving Diffusion-based Visuomotor Policy Via Self-supervised Learning
IEEE International Conference on Robotics and Automation(2024)
摘要
Diffusion models have been adopted for behavioral cloning in a sequence modeling fashion, benefiting from their exceptional capabilities in modeling complex data distributions. The standard diffusion-based policy iteratively denoises action sequences from random noise conditioned on the input states and the model is typically trained with a singular diffusion loss. This paper explores the potential enhancements in such models when the denoising process is informed by a better visual representation. We study the scenario where the model is jointly optimized using the standard diffusion loss alongside an auxiliary objective based on self-supervised learning. After experimenting with various objectives, we introduce Crossway Diffusion, a simple yet effective way to enhance diffusion-based visuomotor policy learning via a state decoder and an auxiliary reconstruction objective. During training, the state decoder reconstructs raw image pixels and other states from the intermediate representations of the model. Experiments demonstrate the effectiveness of our method in various simulated and real-world tasks, confirming its consistent advantages over the standard diffusion-based policy and other baselines.
更多查看译文
关键词
Visual Learning,Imitation Learning,Representation Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn