The video analytics technology has been a rapidly improving discipline in the past decade. With the recent developments in computer vision, we now have the ability to mine massive video data to obtain a clear understanding of what is happening in the world. Because of the remarkable successes of deep learning, currently, we are able to improve video analysis performance significantly than traditional statistical approaches. This study focuses on classifying the patterns of firework videos with various deep learning techniques using spatial and temporal features, beyond common types of pattern classifications. Among successful artificial neural networks, Convolutional Neural Networks (CNN) have demonstrated superiority on modeling high-level visual concepts, while Long Short-term Memory (LSTM) and Gated Recurrent Unit (GRU) units have shown great talent in modeling temporal dynamics in video-based pattern classification. Our basic models consist of CNN, LSTM, and GRU and, we did experiments by fine-tuning the parameters of layers and using different dropout values with sequence LSTM and GRU models. Our experimental results demonstrated that the model with a sequence of LSTM units and double dropout layers—one for input and another for hidden layers—outperforms the other experimental models with the training accuracy of 83.05%.