What is Point Supervision Worth in Video Instance Segmentation?
Computer Vision and Pattern Recognition(2024)
摘要
Video instance segmentation (VIS) is a challenging vision task that aims todetect, segment, and track objects in videos. Conventional VIS methods rely ondensely-annotated object masks which are expensive. We reduce the humanannotations to only one point for each object in a video frame during training,and obtain high-quality mask predictions close to fully supervised models. Ourproposed training method consists of a class-agnostic proposal generationmodule to provide rich negative samples and a spatio-temporal point-basedmatcher to match the object queries with the provided point annotations.Comprehensive experiments on three VIS benchmarks demonstrate competitiveperformance of the proposed framework, nearly matching fully supervisedmethods.
更多查看译文
关键词
video instance segmentation,point supervision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn