TY - JOUR
T1 - Teaching Yourself
T2 - A Self-Knowledge Distillation Approach to Action Recognition
AU - Vu, Duc Quang
AU - Le, Ngan
AU - Wang, Jia Ching
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Knowledge distillation, which is a process of transferring complex knowledge learned by a heavy network, i.e., a teacher, to a lightweight network, i.e., a student, has emerged as an effective technique for compressing neural networks. To reduce the necessity of training a large teacher network, this paper leverages the recent self-knowledge distillation approach to train a student network progressively by distilling its own knowledge without a pre-trained teacher network. Far from the existing self-knowledge distillation methods, which mainly focus on still images, our proposed Teaching Yourself is a self-knowledge distillation technique that targets at videos for human action recognition. Our proposed Teaching Yourself is not only designed as an effective lightweight network but also a high generalization capability model. In our approach, the network is able to update itself using the best past model, termed the preceding model, which is then utilized to guide the training process to update the present model. Inspired by consistency training in state-of-the-art semi-supervised learning methods, we also introduce an effective augmentation strategy to increase data diversity and improve network generalization and consistent predictions for our proposed Teaching Yourself approach. Our benchmark has been conducted on both the 3D Resnet-18 and 3D ResNet-50 backbone networks and evaluated on various standard datasets such as UCF101, HMDB51, and Kinetics400 datasets. The experimental results have shown that our teaching yourself method significantly improves the action recognition performance in terms of accuracy compared to existing supervised learning and knowledge distillation methods. We also have conducted an expensive ablation study to demonstrate that our approach mitigates overconfident predictions on dark knowledge and generates more consistent predictions in input variations of the same data point. The code is available at https://github.com/vdquang1991/Self-KD.
AB - Knowledge distillation, which is a process of transferring complex knowledge learned by a heavy network, i.e., a teacher, to a lightweight network, i.e., a student, has emerged as an effective technique for compressing neural networks. To reduce the necessity of training a large teacher network, this paper leverages the recent self-knowledge distillation approach to train a student network progressively by distilling its own knowledge without a pre-trained teacher network. Far from the existing self-knowledge distillation methods, which mainly focus on still images, our proposed Teaching Yourself is a self-knowledge distillation technique that targets at videos for human action recognition. Our proposed Teaching Yourself is not only designed as an effective lightweight network but also a high generalization capability model. In our approach, the network is able to update itself using the best past model, termed the preceding model, which is then utilized to guide the training process to update the present model. Inspired by consistency training in state-of-the-art semi-supervised learning methods, we also introduce an effective augmentation strategy to increase data diversity and improve network generalization and consistent predictions for our proposed Teaching Yourself approach. Our benchmark has been conducted on both the 3D Resnet-18 and 3D ResNet-50 backbone networks and evaluated on various standard datasets such as UCF101, HMDB51, and Kinetics400 datasets. The experimental results have shown that our teaching yourself method significantly improves the action recognition performance in terms of accuracy compared to existing supervised learning and knowledge distillation methods. We also have conducted an expensive ablation study to demonstrate that our approach mitigates overconfident predictions on dark knowledge and generates more consistent predictions in input variations of the same data point. The code is available at https://github.com/vdquang1991/Self-KD.
KW - Self-knowledge distillation
KW - action recognition
KW - convolutional neural network
KW - deep learning
KW - knowledge distillation
KW - self-learning
UR - http://www.scopus.com/inward/record.url?scp=85112621298&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3099856
DO - 10.1109/ACCESS.2021.3099856
M3 - 期刊論文
AN - SCOPUS:85112621298
SN - 2169-3536
VL - 9
SP - 105711
EP - 105723
JO - IEEE Access
JF - IEEE Access
M1 - 9495804
ER -