Abstract
Hashing schemes are a common technique to improve the performance in mining not only association rules but also sequential patterns or traversal patters. However, the collision problem in hash schemes may result in severe performance degradation. In this paper, we propose perfect hashing schemes for mining traversal patterns to avoid collisions in the hash table. The main idea is to transform each large itemsets into one large 2-itemset by employing a delicate encoding scheme. Then perfect hash schemes designed only for itemsets of length two, rather than varied lengths, are applied. The experimental results show that our method is more than twice as faster than FS algorithm. The results also show our method is scalable to database sizes. One variant of our perfect hash scheme, called partial hash, is proposed to cope with the enormous memory space required by typical perfect hash functions. We also give a comparison of the performances of different perfect hash variants and investigate their properties.
Original language | English |
---|---|
Pages (from-to) | 185-202 |
Number of pages | 18 |
Journal | Fundamenta Informaticae |
Volume | 70 |
Issue number | 3 |
State | Published - 2006 |
Keywords
- Data mining
- Perfect hashing
- Performance analysis
- Traversal patterns