TY - GEN
T1 - Early versus late dimensionality reduction of bag-of-words feature representation for image classification
AU - Tsai, Chih Fong
AU - Hu, Ya Han
AU - Lin, Wei Chao
AU - Wang, Ming Chang
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/12/8
Y1 - 2017/12/8
N2 - Extracting the bag-of-words (BoW) feature from images has been widely used for image classification. In general, some local keypoints are first of all detected from each image and the keypoint descriptor, such as scale-invariant feature transform (SIFT), is extracted. Then, the keypoint descriptors of a given image dataset are tokenized (or clustered) to generate a visual-word vocabulary (or codebook). Next, the visual-word vector of an image contains the presence or absence information of each visual word in the image, e.g. the number of keypoints in the corresponding cluster, i.e. visual word. Consequently, images are represented by a histogram over visual words. Since the dimensionalities of the SIFT keypoint descriptor and the final BoW feature for image classification are certainly high, this paper aims at examining the effect of performing dimensionality reduction (DR) for both different features on classification accuracy. In particular, early DR is used over the SIFT descriptor and late DR for the BoW feature. The experimental results based on Caltech 101 (2-D images) and ESB (3-D images) datasets show that reducing 50% dimensionality of the SIFT descriptor by PCA can allow the SVM classifier to perform similar to the one without DR. On the other hand, late DR only works for 2-D images, but the classification performance of SVM cannot be kept if over 25% dimensionality of the BoW feature is reduced.
AB - Extracting the bag-of-words (BoW) feature from images has been widely used for image classification. In general, some local keypoints are first of all detected from each image and the keypoint descriptor, such as scale-invariant feature transform (SIFT), is extracted. Then, the keypoint descriptors of a given image dataset are tokenized (or clustered) to generate a visual-word vocabulary (or codebook). Next, the visual-word vector of an image contains the presence or absence information of each visual word in the image, e.g. the number of keypoints in the corresponding cluster, i.e. visual word. Consequently, images are represented by a histogram over visual words. Since the dimensionalities of the SIFT keypoint descriptor and the final BoW feature for image classification are certainly high, this paper aims at examining the effect of performing dimensionality reduction (DR) for both different features on classification accuracy. In particular, early DR is used over the SIFT descriptor and late DR for the BoW feature. The experimental results based on Caltech 101 (2-D images) and ESB (3-D images) datasets show that reducing 50% dimensionality of the SIFT descriptor by PCA can allow the SVM classifier to perform similar to the one without DR. On the other hand, late DR only works for 2-D images, but the classification performance of SVM cannot be kept if over 25% dimensionality of the BoW feature is reduced.
KW - Bag-of-words
KW - Dimensionality reduction
KW - Feature selection
KW - Image classification
KW - Principal component analysis
UR - http://www.scopus.com/inward/record.url?scp=85041907023&partnerID=8YFLogxK
U2 - 10.1145/3175587.3175598
DO - 10.1145/3175587.3175598
M3 - 會議論文篇章
AN - SCOPUS:85041907023
T3 - ACM International Conference Proceeding Series
SP - 42
EP - 45
BT - Proceedings of 2017 International Conference on Bioinformatics Research and Applications, ICBRA 2017
PB - Association for Computing Machinery
T2 - 2017 International Conference on Bioinformatics Research and Applications, ICBRA 2017
Y2 - 8 December 2017 through 10 December 2017
ER -