Popular deep learning models for text segmentation include CTPN, EAST, and PixelLink. However, they are not very well capable of dealing with the images containing densely distributed characters, and those characters may be connected. For these problems, the ResNet with excellent sensitivity for feature extraction is used to replace those embedded convolution neural networks in the main structures of CTPN and EAST. The experimental results showed that a better feature extraction network could significantly improve the precision of text localization. Noteworthy, the results indicate that the accuracy of modified EAST with ResNet101 would be the highest with a deeper depth and larger width of ResNet. The accuracy of text segmentation on ICDAR 2015 is 83.4% which is 7% higher than the original PVANET-EAST. The text detection accuracy is 83.9% on the untrained scanned document. Also, it achieved an accuracy of 86.3% when applied to self-collected Chinese calligraphy. Those results demonstrated that text detection using ResNet is a better improvement for OCR applications.
- Convolutional neural network
- Deep learning models
- Feature extraction
- Optical character recognition
- Text segmentation