Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Computer Vision and Pattern Recognition(2024)
摘要
Deep Text-to-Image Synthesis (TIS) models such as Stable Diffusion haverecently gained significant popularity for creative Text-to-image generation.Yet, for domain-specific scenarios, tuning-free Text-guided Image Editing (TIE)is of greater importance for application developers, which modify objects orobject properties in images by manipulating feature components in attentionlayers during the generation process. However, little is known about whatsemantic meanings these attention layers have learned and which parts of theattention maps contribute to the success of image editing. In this paper, weconduct an in-depth probing analysis and demonstrate that cross-attention mapsin Stable Diffusion often contain object attribution information that canresult in editing failures. In contrast, self-attention maps play a crucialrole in preserving the geometric and shape details of the source image duringthe transformation to the target image. Our analysis offers valuable insightsinto understanding cross and self-attention maps in diffusion models. Moreover,based on our findings, we simplify popular image editing methods and propose amore straightforward yet more stable and efficient tuning-free procedure thatonly modifies self-attention maps of the specified attention layers during thedenoising process. Experimental results show that our simplified methodconsistently surpasses the performance of popular approaches on multipledatasets.
更多查看译文
关键词
image editing,attention map,probing analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn