Document analysis plays an important role in office automation, especially in intelligent signal processing. In this paper, we propose an intelligent document analysis system to achieve the document segmentation and identification goal. The proposed system consists of two modules: block segmentation and block identification. In our approach, we first segment a document into several non-overlapping blocks by utilizing a novel recursive segmentation technique, then extract the features embedded in each segmented block. Two kinds of features, connectivity histogram and multiresolution features, are extracted. The features are verified to be effective in characterizing document blocks. Last, a two-layer perceptron is adopted in the identification module to determine the identity of the considered block. Experiments with a wide varity of documents verify the feasibility of our approach.
- Alternative tree representation
- Connectivity histogram
- Document analysis
- Multi-resolution features