This paper proposes an efficient framework that recognizes hand typing motions and gestures for making a virtual keyboard by using a single RGB camera. There are several works related to virtual keyboard in the Human-computer interaction (HCI) area. Most of them use hand pose estimation, hand shape and external equipment (depth sensor, leap motion, control glove, touch screen etc.). Whereas, our framework does not require additional equipment or prior experience from users, it works like a regular typing action in the air which is similar to typing on a real QWERTY keyboard. It uses convolutional neural networks (CNN) to classify 2 hand typing gestures (touch and non-touch). Also, we train 11 gestures which are non-touch and touching for each 10 fingers of two hands gestures. Proposed CNN model achieves a 99.2% classification accuracy for the 2 gestures case and a 91% classification accuracy for the 11 gestures case.