In this work, we propose a system for speech emotion recognition based on regression models and classification models jointly. This speech emotion recognition technology can achieve the accuracy of 64.70% in the dataset of script and improvised mixed scenes. The accuracy can be up to 66.34% in the dataset with only improvised scenes. Compared to the state-of-art technology without the mental states, the accuracy of the proposed method is increased by 2.95% and 2.09% respect to improvised and mixed scenes. The results show that the characteristics of mental states can effectively improve the performance of speech emotion recognition.