Efficient Visual Tracking Using Local Information Patch Attention Free Transformer

Pin Feng Wang, Chih Wei Tang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The state-of-the-art (SOTA) transformer tracker TransT achieves high tracking accuracy. Nevertheless, the time and space complexity of its attention operation is quadratic to the spatial dimension of feature vectors. Thus it is difficult to deploy TransT on resource constrained devices. This paper proposes Local Information Patch Attention Free Transformer (LIP-AFT) based Local Information Patch Self-Attention Free Transformer (LIPS-AFT) and Local Information Patch Cross-Attention Free Transformer (LIPC-AFT) for linear time and space complexity and high accuracy. LIP-AFT benefits from global connectivity between patches while it focuses on naïve strong local attention patterns. The proposed tracker outperforms both SOTA trackers and TransT with various SOTA attention algorithms on accuracy and complexity. Moreover, its inference phase runs at 41 fps on RTX 2070S GPUs.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages447-448
Number of pages2
ISBN (Electronic)9781665470506
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022 - Taipei, Taiwan
Duration: 6 Jul 20228 Jul 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022

Conference

Conference2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022
Country/TerritoryTaiwan
CityTaipei
Period6/07/228/07/22

Keywords

  • attention
  • space and time complexity
  • tracking
  • transformer

Fingerprint

Dive into the research topics of 'Efficient Visual Tracking Using Local Information Patch Attention Free Transformer'. Together they form a unique fingerprint.

Cite this