MAWKDN: A Multimodal Fusion Wavelet Knowledge Distillation Approach Based on Cross-View Attention for Action Recognition
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY(2023)
摘要
The recognition performance of existing vision-based human action recognition (HAR) methods is greatly reduced in the case of low camera resolution or occlusion. Wearable sensors can provide complementary information to alleviate this problem. It is challenging to construct a robust HAR model using multimodal wearable-sensor data. In this paper, we propose a cross-Attention-based Multimodal fusion Wavelet Knowledge Distillation Network (MAWKDN) method to guide recognition from video data by acquiring complementary information from wearable sensors and reduce the noise effects through wavelet knowledge distillation, which improves the robustness of the model. A multi-attention dilated convolution kernel residual network including dilated convolution and an attention mechanism is constructed to extract features from various sensor modalities and fuse the various modal data through the cross-view attention method to acquire additional information from different modalities. To reduce the modal differences between different modalities of the teacher and student networks and acquire similar semantic knowledge, we learn the information between different modalities by constructing a graph structure of convolutional layer features, and computing the semantic preservation loss between the teacher and student networks. To reduce the influence of noise in the input data, we construct the loss of wavelet knowledge distillation, which transforms the image through the discrete wavelet transform and only retains the low frequency features to extract the useful information. The top-1 accuracy achieved on the UTD-MHAD (99.31%), Berkeley-MHAD (99.40%) and the F1-score on the MMAct (85.26% based on cross-session) dataset prove the superior performance of MAWKDN compared with the state-of-the-art HAR methods. Moreover, we demonstrate the robustness of the MAWKDN approach on the noise-added UTD-MHAD dataset.
更多查看译文
关键词
Human action recognition,multimodal,wavelet knowledge distillation,wearable sensor,attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn