Separable ConvNet Spatiotemporal Mixer for Action Recognition

Hsu Yung Cheng, Chih Chang Yu, Chenyu Li

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Video action recognition is vital in the research area of computer vision. In this paper, we develop a novel model, named Separable ConvNet Spatiotemporal Mixer (SCSM). Our goal is to develop an efficient and lightweight action recognition backbone that can be applied to multi-task models to increase the accuracy and processing speed. The SCSM model uses a new hierarchical spatial compression, employing the spatiotemporal fusion method, consisting of a spatial domain and a temporal domain. The SCSM model maintains the independence of each frame in the spatial domain for feature extraction and fuses the spatiotemporal features in the temporal domain. The architecture can be adapted to different frame rate requirements due to its high scalability. It is suitable to serve as a backbone for multi-task video feature extraction or industrial applications with its low prediction and training costs. According to the experimental results, SCSM has a low number of parameters and low computational complexity, making it highly scalable with strong transfer learning capabilities. The model achieves video action recognition accuracy comparable to state-of-the-art models with a smaller parameter size and fewer computational requirements.

原文???core.languages.en_GB???
文章編號496
期刊Electronics (Switzerland)
13
發行號3
DOIs
出版狀態已出版 - 2月 2024

指紋

深入研究「Separable ConvNet Spatiotemporal Mixer for Action Recognition」主題。共同形成了獨特的指紋。

引用此