Virtual object tracking is an active research area in computer vision. It aims to estimate the location of the target object in video frames. For the past few years, the deep learning method has been widely used for object tracking to improve accuracy. However, there are still challenges of performance problems and accuracy. This study aims to enhance the performance of an object detection model by focusing on single object tracking using Siamese network architecture and a correlation filter to find the relationship between the target object and search object from a series of continuous images. We mitigate some challenging problems in the Siamese network by adding variance loss to improve the model to distinguish between the foreground and the background. Furthermore, we add the attention mechanism and process the cropped image to find the relationship between objects and objects. Our experiment used the VOT2019 dataset for testing object tracking and the CUHK03 dataset for the training model. The result demonstrates that the proposed model achieves promising prediction performance to solve the image occlusion problem and reduce false alarms from object detection. We achieved an accuracy of 0.608, a robustness of 0.539, and an expected average overlap (EAO) score of 0.217. Our tracker runs at approximately 26 fps on GPU.