A Video Face Recognition Leveraging Temporal Information Based on Vision Transformer.
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V(2024)
摘要
Video face recognition (VFR) has gained significant attention as a promising field combining computer vision and artificial intelligence, revolutionizing identity authentication and verification. Unlike traditional image-based methods, VFR leverages the temporal dimension of video footage to extract comprehensive and accurate facial information. However, VFR heavily relies on robust computing power and advanced noise processing capabilities to ensure optimal recognition performance. This paper introduces a novel length-adaptive VFR framework based on a recurrent-mechanism-driven Vision Transformer, termed TempoViT. TempoViT efficiently captures spatial and temporal information from face videos, enabling accurate and reliable face recognition while mitigating the high GPU memory requirements associated with video processing. By leveraging the reuse of hidden states from previous frames, the framework establishes recurring links between frames, allowing the modeling of long-term dependencies. Experimental results validate the effectiveness of TempoViT, demonstrating its state-of-the-art performance in video face recognition tasks on benchmark datasets including iQIYI-ViD, YTF, IJB-C, and Honda/UCSD.
更多查看译文
关键词
Video face recognition,Vision Transformer,Temporal information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn