TY - JOUR
T1 - HAPiCLR
T2 - heuristic attention pixel-level contrastive loss representation learning for self-supervised pretraining
AU - Tran, Van Nhiem
AU - Liu, Shen Hsuan
AU - Huang, Chi En
AU - Aslam, Muhammad Saqlain
AU - Yang, Kai Lin
AU - Li, Yung-Hui
AU - Wang, Jia Ching
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
PY - 2024
Y1 - 2024
N2 - Recent self-supervised contrastive learning methods are powerful and efficient for robust representation learning, pulling semantic features from different cropping views of the same image while pushing other features away from other images in the embedding vector space. However, model training for contrastive learning is quite inefficient. In the high-dimensional vector space of the images, images can differ from each other in many ways. We address this problem with heuristic attention pixel-level contrastive loss for representation learning (HAPiCLR), a self-supervised joint embedding contrastive framework that operates at the pixel level and makes use of heuristic mask information. HAPiCLR leverages pixel-level information from the object’s contextual representation instead of identifying pair-wise differences in instance-level representations. Thus, HAPiCLR enhances contrastive learning objectives without requiring large batch sizes, memory banks, or queues, thereby reducing the memory footprint and the processing needed for large datasets. Furthermore, HAPiCLR loss combined with other contrastive objectives such as SimCLR or MoCo loss produces considerable performance boosts on all downstream tasks, including image classification, object detection, and instance segmentation.
AB - Recent self-supervised contrastive learning methods are powerful and efficient for robust representation learning, pulling semantic features from different cropping views of the same image while pushing other features away from other images in the embedding vector space. However, model training for contrastive learning is quite inefficient. In the high-dimensional vector space of the images, images can differ from each other in many ways. We address this problem with heuristic attention pixel-level contrastive loss for representation learning (HAPiCLR), a self-supervised joint embedding contrastive framework that operates at the pixel level and makes use of heuristic mask information. HAPiCLR leverages pixel-level information from the object’s contextual representation instead of identifying pair-wise differences in instance-level representations. Thus, HAPiCLR enhances contrastive learning objectives without requiring large batch sizes, memory banks, or queues, thereby reducing the memory footprint and the processing needed for large datasets. Furthermore, HAPiCLR loss combined with other contrastive objectives such as SimCLR or MoCo loss produces considerable performance boosts on all downstream tasks, including image classification, object detection, and instance segmentation.
KW - Object contextual representation
KW - Pixel-level attention
KW - Pixel-level contrastive learning
KW - Self-supervised learning
KW - Visual representation learning
UR - http://www.scopus.com/inward/record.url?scp=85187936231&partnerID=8YFLogxK
U2 - 10.1007/s00371-023-03217-x
DO - 10.1007/s00371-023-03217-x
M3 - 期刊論文
AN - SCOPUS:85187936231
SN - 0178-2789
JO - Visual Computer
JF - Visual Computer
ER -