Image feature representation by bag-of-visual words (BOVW) has been widely considered in the image classification related problems. The feature extraction step is usually based on tokenizing the detected keypoints as the visual words. As a result, the visual-word vector of an image represents how often the visual words occur in an image. To train and test an image classifier, the BOVW features of the training and testing images can be extracted by either at the same time or separately. Therefore, the aim of this paper is to examine the classification performance of using these two different feature extraction strategies. We show that there is no significant difference between these two strategies, but extracting the BOVW features from the training and testing images at the same time requires much longer time. Therefore, the key criterion of choosing the right strategy of BOVW feature extraction is based on the dataset size.