QLDT: Adaptive Query Learning for HOI Detection Via Vision-Language Knowledge Transfer
APPLIED INTELLIGENCE(2024)
摘要
Human-object interaction detection can be mainly categorized into two core problems, namely human-object association detection and interaction understanding. Firstly, for association detection, previous methods tend to directly detect obvious human-object interaction pairs, while ignoring some interaction pairs that may have potential interaction relationships, which is contrary to the actual situation. Secondly, for the interaction understanding problem, traditional methods face the challenges of long-tailed distribution and zero-shot detection, which cannot flexibly deal with complex and changing real-world scenarios. To this end, adaptive Query Learning for HOI Detection via vision-language knowledge Transfer(QLDT) is proposed. Specifically, a two-stage dynamic matching scoring algorithm based on dynamically changing thresholds and scores is designed to explore obscure H-O pairs and labeling to enlarge the sample size. Secondly, a visual-language pre-trained model GLIP (Grounded Language-Image Pre-training), is introduced to enhance the model’s interactive comprehension ability, extract the visual and linguistic features of the images through GLIP, minimize the gap with the predicted values using cross-entropy loss, and take the maximum value of the score with the obscure H-O pairs as the final prediction, which ensures the model’s positivity. The proposed method shows excellent performance on both HICO-DET and V-COCO datasets, for HICO-DET, QLDT achieved 35.37
更多查看译文
关键词
HOI detection,Cross-modal,Interactive understanding,Query learning,Knowledge transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn