Abstract
In this paper, a feature-based document analysis system is presented which utilizes domain knowledge to segment and classify mixed text/graphics/image documents. In our approach, we first perform a run-length smearing operation followed by the stripe merging procedure to segment the blocks embedded in a document. The classification task is then performed based on the domain knowledge induced from the primitives associated with each type of medium. Proper use of domain knowledge is proved to be effective in accelerating the segmentation speed and decreasing the classification error. The experimental study reveals the feasibility of the new technique in segmenting and classifying mixed text/graphics/image documents.
Original language | English |
---|---|
Pages (from-to) | 1201-1209 |
Number of pages | 9 |
Journal | Pattern Recognition Letters |
Volume | 15 |
Issue number | 12 |
DOIs | |
State | Published - Dec 1994 |
Keywords
- Block classification
- Connectivity histogram
- Document segmentation
- Projection feature