MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
arxiv(2024)
摘要
Given the remarkable success that large visual language models (LVLMs) have
achieved in image perception tasks, the endeavor to make LVLMs perceive the
world like humans is drawing increasing attention. Current multi-modal
benchmarks primarily focus on facts or specific topic-related knowledge
contained within individual images. However, they often overlook the
associative relations between multiple images, which require the identification
and analysis of similarities among entities or content present in different
images. Therefore, we propose the multi-image relation association task and a
meticulously curated Multi-granularity Multi-image Relational Association
(MMRA) benchmark, comprising 1,024 samples. In order to systematically and
comprehensively evaluate current LVLMs, we establish an associational relation
system among images that contain 11 subtasks (e.g, UsageSimilarity, SubEvent)
at two granularity levels (i.e., image and entity) according to the relations
in ConceptNet. Our experiments reveal that on the MMRA benchmark, current
multi-image LVLMs exhibit distinct advantages and disadvantages across various
subtasks. Notably, fine-grained, entity-level multi-image perception tasks pose
a greater challenge for LVLMs compared to image-level tasks. Moreover, LVLMs
perform poorly on spatial-related tasks, indicating that LVLMs still have
limited spatial awareness. Additionally, our findings indicate that while LVLMs
demonstrate a strong capability to perceive image details, enhancing their
ability to associate information across multiple images hinges on improving the
reasoning capabilities of their language model component. Moreover, we explored
the ability of LVLMs to perceive image sequences within the context of our
multi-image association task. Our experiments show that the majority of current
LVLMs do not adequately model image sequences during the pre-training process.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn