Enhancing Single Object Tracking With a Hybrid Approach: Temporal Convolutional Networks, Attention Mechanisms, and Spatial-Temporal Memory

Pimpa Cheewaprakobkit, Chih Yang Lin, Timothy K. Shih, Avirmed Enkhbat

Research output: Contribution to journalArticlepeer-review

Abstract

Deep neural network-based tracking tasks have experienced significant advancements in recent years. However, these networks continue to face challenges in effectively adapting to appearance changes in both target and background, as well as linking objects after extended periods. The primary challenge in tracking lies in the frequent changes in a target's appearance throughout the tracking process, which can potentially reduce tracker robustness when faced with issues such as aspect ratio changes, occlusions, scale variations, and confusion from similar objects. To address this challenge, we propose a tracking architecture that combines a temporal convolutional network (TCN) and attention mechanism with spatial-temporal memory. The TCN component empowers the model to capture temporal dependencies, while the attention mechanism reduces computational complexity by focusing on crucial regions based on context. We leverage the target's historical information stored in the spatial-temporal memory network to guide the tracker in better adapting to target deformation. Our model attains a 67.5% average overlap (AO) on the GOT-10K dataset, a 72.1% success score (AUC) on OTB2015, a 65.8% success score (AUC) on UAV123, and achieves 59.0% accuracy on the VOT2018 dataset. These outcomes demonstrate the high effectiveness of our proposed tracker in tracking a single object.

Original languageEnglish
Pages (from-to)139211-139222
Number of pages12
JournalIEEE Access
Volume11
DOIs
StatePublished - 2023

Keywords

  • Temporal convolutional networks
  • attention mechanism
  • single object tracking
  • spatial-temporal memory

Fingerprint

Dive into the research topics of 'Enhancing Single Object Tracking With a Hybrid Approach: Temporal Convolutional Networks, Attention Mechanisms, and Spatial-Temporal Memory'. Together they form a unique fingerprint.

Cite this