PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Computer Vision and Pattern Recognition(2024)
摘要
Recent transformer-based architectures have shown impressive results in thefield of image segmentation. Thanks to their flexibility, they obtainoutstanding performance in multiple segmentation tasks, such as semantic andpanoptic, under a single unified framework. To achieve such impressiveperformance, these architectures employ intensive operations and requiresubstantial computational resources, which are often not available, especiallyon edge devices. To fill this gap, we propose Prototype-based EfficientMaskFormer (PEM), an efficient transformer-based architecture that can operatein multiple segmentation tasks. PEM proposes a novel prototype-basedcross-attention which leverages the redundancy of visual features to restrictthe computation and improve the efficiency without harming the performance. Inaddition, PEM introduces an efficient multi-scale feature pyramid network,capable of extracting features that have high semantic content in an efficientway, thanks to the combination of deformable convolutions and context-basedself-modulation. We benchmark the proposed PEM architecture on two tasks,semantic and panoptic segmentation, evaluated on two different datasets,Cityscapes and ADE20K. PEM demonstrates outstanding performance on every taskand dataset, outperforming task-specific architectures while being comparableand even better than computationally-expensive baselines.
更多查看译文
关键词
Computer Vision,Efficient Neural Network,Image Segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn