LLaNA: Large Language and NeRF Assistant

NeurIPS 2024（2024）

引用 0|浏览13

摘要

Multimodal Large Language Models (MLLMs) have demonstrated an excellentunderstanding of images and 3D data. However, both modalities have shortcomingsin holistically capturing the appearance and geometry of objects. Meanwhile,Neural Radiance Fields (NeRFs), which encode information within the weights ofa simple Multi-Layer Perceptron (MLP), have emerged as an increasinglywidespread modality that simultaneously encodes the geometry and photorealisticappearance of objects. This paper investigates the feasibility andeffectiveness of ingesting NeRF into MLLM. We create LLaNA, the firstgeneral-purpose NeRF-language assistant capable of performing new tasks such asNeRF captioning and Q&A. Notably, our method directly processes the weights ofthe NeRF's MLP to extract information about the represented objects without theneed to render images or materialize 3D data structures. Moreover, we build adataset of NeRFs with text annotations for various NeRF-language tasks with nohuman intervention. Based on this dataset, we develop a benchmark to evaluatethe NeRF understanding capability of our method. Results show that processingNeRF weights performs favourably against extracting 2D or 3D representationsfrom NeRFs.

查看译文

关键词

LLM,NeRF,VQA

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

您的评分 :

暂无评分

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn