Dynamic tracking attention model for action recognition

Chien Yao Wang, Chin Chin Chiang, Jian Jiun Ding, Jia Ching Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

This paper proposes a dynamic tracking attention model (DTAM), which mainly comprises a motion attention mechanism, a convolutional neural network (CNN) and long short-term memory (LSTM), to recognize human action in a video sequence. In the motion attention mechanism, the local dynamic tracking is used to track moving objects in feature domain and global dynamic tracking corrects the motion in the spectral domain. The CNN is utilized to perform feature extraction, while the LSTM is applied to handle sequential information about actions that is extracted from videos. It effectively fetches information between consecutive frames in a video sequence and has an even higher recognition rate than does the CNN-LSTM. Combining the DTAM with the visual attention model, the proposed algorithm has a recognition rate that is 3.6% and 4.5% higher than that of the CNN-LSTMs with and without the visual attention model, respectively.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1617-1621
Number of pages5
ISBN (Electronic)9781509041176
DOIs
StatePublished - 16 Jun 2017
Event2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
Duration: 5 Mar 20179 Mar 2017

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
Country/TerritoryUnited States
CityNew Orleans
Period5/03/179/03/17

Keywords

  • Action recognition
  • attention model
  • convolutional neural network
  • deep learning
  • long short-term memory (LSTM)

Fingerprint

Dive into the research topics of 'Dynamic tracking attention model for action recognition'. Together they form a unique fingerprint.

Cite this