Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
CoRR(2024)
摘要
This study presents Medical Vision Generalist (MVG), the first foundation
model capable of handling various medical imaging tasks – such as cross-modal
synthesis, image segmentation, denoising, and inpainting – within a unified
image-to-image generation framework. Specifically, MVG employs an in-context
generation strategy that standardizes the handling of inputs and outputs as
images. By treating these tasks as an image generation process conditioned on
prompt image-label pairs and input images, this approach enables a flexible
unification of various tasks, even those spanning different modalities and
datasets. To capitalize on both local and global context, we design a hybrid
method combining masked image modeling with autoregressive training for
conditional image generation. This hybrid approach yields the most robust
performance across all involved medical imaging tasks. To rigorously evaluate
MVG's capabilities, we curated the first comprehensive generalist medical
vision benchmark, comprising 13 datasets and spanning four imaging modalities
(CT, MRI, X-ray, and micro-ultrasound). Our results consistently establish
MVG's superior performance, outperforming existing vision generalists, such as
Painter and LVM. Furthermore, MVG exhibits strong scalability, with its
performance demonstrably improving when trained on a more diverse set of tasks,
and can be effectively adapted to unseen datasets with only minimal
task-specific samples. The code is available at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn