A feature point clustering approach to the recognition of form documents

Among various kinds of documents, forms are one of the important types. Automatic processing of form documents is a problem which is essential to the advancement of office automation. In this paper, we will present a clustering-based approach to recognize form documents. In our approach, the characters embedded in a form document are extracted first by separating the characters and structured line patterns into two distinct groups. Next, clustering process is employed to the corner points of the remained structured line patterns. Each form document is then represented as a weighted graph according to the clustering result. Form recognition problem is thereby formulated as a graph matching problem. The feasibility of the novel method is demonstrated through experimenting various kinds of forms. Experimental results reveal the feasibility of the novel method.

  • Document analysis
  • Feature point clustering
  • Maximin clustering algorithm
  • Weighted graph matching


