An Interpretable Visual Attention Plug-in for Convolutions

Chih Yang Lin, Chia Lin Wu, Hui Fuang Ng, Timothy K. Shih

研究成果: 雜誌貢獻期刊論文同行評審

2 引文 斯高帕斯(Scopus)


Raw images, which may contain many noisy background pixels, are typically used in convolutional neural network (CNN) training. This paper proposes a novel variance loss function based on a ground truth mask of the target object to enhance the visual attention of a CNN. The loss function regularizes the training process so that the feature maps in the later convolutional layer are focused more on target object areas and less on the background. Attention loss is computed directly from the feature maps, so no new parameters are added to the backbone network; therefore, no extra computational cost is added to the testing phase. The proposed attention model can be a plug-in for any pre-trained network architecture and can be used in conjunction with other attention models. Experimental results demonstrate that the proposed variance loss function improves classification accuracy by 2.22% over the baseline on the Stanford Dogs dataset, which is significantly higher than the improvements achieved by SENet (0.3%) and CBAM (1.14%). Our method also improves object detection accuracy by 2.5 mAP on the Pascal-VOC2007 dataset and store sign detection by 2.66 mAP over respective baseline models. Furthermore, the proposed loss function enhances the visualization and interpretability of a CNN.

頁(從 - 到)136992-137003
期刊IEEE Access
出版狀態已出版 - 2020


深入研究「An Interpretable Visual Attention Plug-in for Convolutions」主題。共同形成了獨特的指紋。