SMTDKD: A Semantic-Aware Multimodal Transformer Fusion Decoupled Knowledge Distillation Method for Action Recognition

Zhenzhen Quan,Qingshan Chen,Wei Wang,Moyan Zhang,Xiang Li,Yujun Li,Zhi Liu

IEEE SENSORS JOURNAL（2024）

引用 0|浏览32

摘要

Multimodal sensors, including vision sensors and wearable sensors, offer valuable complementary information for accurate recognition tasks. Nonetheless, the heterogeneity among sensor data from different modalities presents a formidable challenge in extracting robust multimodal information amidst noise. In this article, we propose an innovative approach, named semantic-aware multimodal transformer fusion decoupled knowledge distillation (SMTDKD) method, which guides video data recognition not only through the information interaction between different wearable-sensor data, but also through the information interaction between visual sensor data and wearable-sensor data, improving the robustness of the model. To preserve the temporal relationship within wearable-sensor data, the SMTDKD method converts them into 2-D image data. Furthermore, a transformer-based multimodal fusion module is designed to capture diverse feature information from distinct wearable-sensor modalities. To mitigate modality discrepancies and encourage similar semantic features, graph cross-view attention maps are constructed across various convolutional layers to facilitate feature alignment. Additionally, semantic information is exchanged among the teacher-student network, the student network, and bidirectional encoder representations from transformer (BERT)-encoded labels. To obtain more comprehensive knowledge transfer, the decoupled knowledge distillation loss is utilized, thus enhancing the generalization of the network. Experimental evaluations conducted on three multimodal datasets, namely, UTD-MHAD, Berkeley-MHAD, and MMAct, demonstrate the superior performance of the proposed SMTDKD method over the state-of-the-art action human recognition methods.

查看译文

关键词

Transformers,Sensors,Feature extraction,Wearable sensors,Visualization,Semantics,Knowledge engineering,Keywords Decoupled knowledge distillation,human action recognition (HAR),multimodal,transformer,wearable sensor

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

您的评分 :

暂无评分

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn