Applying genetic algorithms to query optimization in document retrieval

Jorng Tzong Horng, Ching Chang Yeh

Research output: Contribution to journalArticlepeer-review

101 Scopus citations

Abstract

This paper proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. One of the contributions of the paper is to combine the Bigram model and PAT-tree structure to retrieve keywords. The approach extracts bigrams from documents and uses the bigrams to construct a PAT-tree to retrieve keywords. The proposed approach can retrieve any type of keywords such as technical keywords and a person's name. Effectiveness of the proposed approach is demonstrated by comparing how effective are the keywords found by both this approach and the PAT-tree based approach. This comparison reveals that our keyword retrieval approach is as accurate as the PAT-tree based approach, yet our approach is faster and uses less memory. The study then applies genetic algorithms to tune the weight of retrieved keywords. Moreover, several documents obtained from web sites are tested and experimental results are compared with those of other approaches, indicating that the proposed approach is highly promising for applications.

Original languageEnglish
Pages (from-to)737-759
Number of pages23
JournalInformation Processing and Management
Volume36
Issue number5
DOIs
StatePublished - 1 Sep 2000

Fingerprint

Dive into the research topics of 'Applying genetic algorithms to query optimization in document retrieval'. Together they form a unique fingerprint.

Cite this