In video indexing and summarization, videotext is the very compact and accurate information. Most videotext detection and extraction methods only deal with the static videotext on video frames. Few methods can handle motion videotext efficiently since motion videotext is hardly extracted well. In this paper, we propose a two-directional videotext extractor, called 2DVTE. It is developed as an integrated system to detect, localize and extract the scrolling videotexts. First, the detection method is carried out by edge information to classify regions into text and non-text regions. Second, referring to the localization on scrolling videotext, we propose the two-dimensional projection profile method with horizontal and vertical edge map information. Considering the characteristics of Chinese text, the vertical edge map is used to localize the possible text region and horizontal edge map is used to refine the text region. Third, the extraction method consists of dual mode adaptive thresholding and multi-seed filling algorithm. In the dual mode adaptive thresholding, it produces the non-rectangle pattern to divide the background and foreground more precisely. Referring to the multi-seed filling algorithm, it is based on the consideration of the minimum and maximum length and four directions of the stroke while the previous method only considers the minimum length and two directions of the stroke. With this multi-seed exploitation on strokes, precise seeds are obtained to produce more sophisticated videotext. Considering high throughput and the low complexity issue, we can achieve a real-time system on detecting, localizing, and extracting the scrolling videotexts with only one frame usage instead of multi-frame integration in other literatures. According to the experiment results on various video sequences, all of the horizontal and vertical scrolling videotexts can be extracted precisely. We also make comparisons with other methods. In our analysis, the performance of our algorithm is superior to other existing methods in speed and quality.
- Edge detection
- Text detection