SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance
CoRR(2024)
摘要
Recent advancements in text-to-image generation have been propelled by the
development of diffusion models and multi-modality learning. However, since
text is typically represented sequentially in these models, it often falls
short in providing accurate contextualization and structural control. So the
generated images do not consistently align with human expectations, especially
in complex scenarios involving multiple objects and relationships. In this
paper, we introduce the Scene Graph Adapter(SG-Adapter), leveraging the
structured representation of scene graphs to rectify inaccuracies in the
original text embeddings. The SG-Adapter's explicit and non-fully connected
graph representation greatly improves the fully connected, transformer-based
text representations. This enhancement is particularly notable in maintaining
precise correspondence in scenarios involving multiple relationships. To
address the challenges posed by low-quality annotated datasets like Visual
Genome, we have manually curated a highly clean, multi-relational scene
graph-image paired dataset MultiRels. Furthermore, we design three metrics
derived from GPT-4V to effectively and thoroughly measure the correspondence
between images and scene graphs. Both qualitative and quantitative results
validate the efficacy of our approach in controlling the correspondence in
multiple relationships.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn