Image retrieval systems aim at effectively retrieving relevant images to the users' queries. They automatically index images by extracting (low-level) visual features of images, such as colour, texture, and shape, and the retrieval of images are based solely upon the indexed image features. However, the extracted and indexed low-level features by computers are not directly correspond to the high-level concepts (or semantics) of user's queries. Image classification provides a solution to this problem that images are automatically classified into some categories. In literature, images are usually segmented into a number of (local) blocks or regions whose low-level features are extracted to represent image content. However, either the block based or region based feature representation is considered and each local feature is usually associated with a class label/category for image classification. This paper examines the applicability of combining contextual image features (i.e. the segmented blocks and regions of an image) to represent image content in terms of classification accuracy. The experimental results show that the combined block based feature of an image outperforms the single usage of block and region based features and the hybrid feature based on the combined block and region features.