PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Computer Vision and Pattern Recognition(2024)
Recent transformer-based architectures have shown impressive results in thefield of image segmentation. Thanks to their flexibility, they obtainoutstanding performance in multiple segmentation tasks, such as semantic andpanoptic, under a single unified framework. To achieve such impressiveperformance, these architectures employ intensive operations and requiresubstantial computational resources, which are often not available, especiallyon edge devices. To fill this gap, we propose Prototype-based EfficientMaskFormer (PEM), an efficient transformer-based architecture that can operatein multiple segmentation tasks. PEM proposes a novel prototype-basedcross-attention which leverages the redundancy of visual features to restrictthe computation and improve the efficiency without harming the performance. Inaddition, PEM introduces an efficient multi-scale feature pyramid network,capable of extracting features that have high semantic content in an efficientway, thanks to the combination of deformable convolutions and context-basedself-modulation. We benchmark the proposed PEM architecture on two tasks,semantic and panoptic segmentation, evaluated on two different datasets,Cityscapes and ADE20K. PEM demonstrates outstanding performance on every taskand dataset, outperforming task-specific architectures while being comparableand even better than computationally-expensive baselines.
Computer Vision,Efficient Neural Network,Image Segmentation
AI 理解论文