What is Point Supervision Worth in Video Instance Segmentation?

Shuaiyi Huang,De-An Huang,Zhiding Yu,Shiyi Lan,Subhashree Radhakrishnan,Jose M. Alvarez,Abhinav Shrivastava,Anima Anandkumar大牛学者

Computer Vision and Pattern Recognition（2024）

引用 0|浏览52

摘要

Video instance segmentation (VIS) is a challenging vision task that aims todetect, segment, and track objects in videos. Conventional VIS methods rely ondensely-annotated object masks which are expensive. We reduce the humanannotations to only one point for each object in a video frame during training,and obtain high-quality mask predictions close to fully supervised models. Ourproposed training method consists of a class-agnostic proposal generationmodule to provide rich negative samples and a spatio-temporal point-basedmatcher to match the object queries with the provided point annotations.Comprehensive experiments on three VIS benchmarks demonstrate competitiveperformance of the proposed framework, nearly matching fully supervisedmethods.

查看译文

关键词

video instance segmentation,point supervision

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

您的评分 :

暂无评分

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn