TY - JOUR
T1 - Enhancing Single Object Tracking With a Hybrid Approach
T2 - Temporal Convolutional Networks, Attention Mechanisms, and Spatial-Temporal Memory
AU - Cheewaprakobkit, Pimpa
AU - Lin, Chih Yang
AU - Shih, Timothy K.
AU - Enkhbat, Avirmed
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - Deep neural network-based tracking tasks have experienced significant advancements in recent years. However, these networks continue to face challenges in effectively adapting to appearance changes in both target and background, as well as linking objects after extended periods. The primary challenge in tracking lies in the frequent changes in a target's appearance throughout the tracking process, which can potentially reduce tracker robustness when faced with issues such as aspect ratio changes, occlusions, scale variations, and confusion from similar objects. To address this challenge, we propose a tracking architecture that combines a temporal convolutional network (TCN) and attention mechanism with spatial-temporal memory. The TCN component empowers the model to capture temporal dependencies, while the attention mechanism reduces computational complexity by focusing on crucial regions based on context. We leverage the target's historical information stored in the spatial-temporal memory network to guide the tracker in better adapting to target deformation. Our model attains a 67.5% average overlap (AO) on the GOT-10K dataset, a 72.1% success score (AUC) on OTB2015, a 65.8% success score (AUC) on UAV123, and achieves 59.0% accuracy on the VOT2018 dataset. These outcomes demonstrate the high effectiveness of our proposed tracker in tracking a single object.
AB - Deep neural network-based tracking tasks have experienced significant advancements in recent years. However, these networks continue to face challenges in effectively adapting to appearance changes in both target and background, as well as linking objects after extended periods. The primary challenge in tracking lies in the frequent changes in a target's appearance throughout the tracking process, which can potentially reduce tracker robustness when faced with issues such as aspect ratio changes, occlusions, scale variations, and confusion from similar objects. To address this challenge, we propose a tracking architecture that combines a temporal convolutional network (TCN) and attention mechanism with spatial-temporal memory. The TCN component empowers the model to capture temporal dependencies, while the attention mechanism reduces computational complexity by focusing on crucial regions based on context. We leverage the target's historical information stored in the spatial-temporal memory network to guide the tracker in better adapting to target deformation. Our model attains a 67.5% average overlap (AO) on the GOT-10K dataset, a 72.1% success score (AUC) on OTB2015, a 65.8% success score (AUC) on UAV123, and achieves 59.0% accuracy on the VOT2018 dataset. These outcomes demonstrate the high effectiveness of our proposed tracker in tracking a single object.
KW - Temporal convolutional networks
KW - attention mechanism
KW - single object tracking
KW - spatial-temporal memory
UR - http://www.scopus.com/inward/record.url?scp=85177089160&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3330644
DO - 10.1109/ACCESS.2023.3330644
M3 - 期刊論文
AN - SCOPUS:85177089160
SN - 2169-3536
VL - 11
SP - 139211
EP - 139222
JO - IEEE Access
JF - IEEE Access
ER -