Classification and numeric estimation are the two most common types of data mining. The goal of classification is to predict the discrete type of output values whereas estimation is aimed at finding the continuous type of output values. Predictive data mining is generally achieved by using only one specific statistical or machine learning technique to construct a prediction model. Related studies have shown that prediction performance by this kind of single flat model can be improved by the utilization of some hierarchical structures. Hierarchical estimation approaches, usually a combination of multiple estimation models, have been proposed for solving some specific domain problems. However, in the literature, there is no generic hierarchical approach for estimation and no hybrid based solution that combines classification and estimation techniques hierarchically. Therefore, we introduce a generic hierarchical architecture, namely hierarchical classification and regression (HCR), suitable for various estimation problems. Simply speaking, the first level of HCR involves pre-processing a given training set by classifying it into k classes, leading to k subsets. Three approaches are used to perform this task in this study: hard classification (HC); fuzzy c-means (FCM); and genetic algorithms (GA). Then, each training data containing its associated class label is used to train a support vector machine (SVM) classifier for classification. Next, for the second level of HCR, k regression (or estimation) models are trained based on their corresponding subsets for final prediction. The experiments based on 8 different UCI datasets show that most hierarchical prediction models developed with the HCR architecture significantly outperform three well-known single flat prediction models, i.e., linear regression (LR), multilayer perceptron (MLP) neural networks, and support vector regression (SVR) in terms of mean absolute percentage error (MAPE) and root mean squared error (RMSE) rates. In addition, it is found that using the GA-based data pre-processing approach to classify the training set into 4 subsets is the best threshold (i.e., k=4) and the 4-class SVM+MLP outperforms three baseline hierarchical regression models.