An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports

Ya Han Hu, Fan Wu, Yi Jiun Liao

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Sequential pattern mining (SPM) is an important technique for determining time-related behavior in sequence databases. In real-life applications, the frequencies for various items in a sequence database are not exactly equal. If all items are set with the same minimum support, the rare item problem may result, meaning that we are unable to effectively retrieve interesting patterns regardless of whether minsup is set too high or too low. Liu (2006) first included the concept of multiple minimum supports (MMSs) to SPM. It allows users to specify the minimum item support (MIS) for each item according to its natural frequency. A generalized sequential pattern-based algorithm, named Multiple Supports-Generalized Sequential Pattern (MS-GSP), was also developed to mine complete set of sequential patterns. However, the MS-GSP adopts candidate generate-and-test approach, which has been recognized as a costly and time-consuming method in pattern discovery. For the efficient mining of sequential patterns with MMSs, this study first proposes a compact data structure, called a Preorder Linked Multiple Supports tree (PLMS-tree), to store and compress the entire sequence database. Based on a PLMS-tree, we develop an efficient algorithm, Multiple Supports-Conditional Pattern growth (MSCP-growth), to discover the complete set of patterns. The experimental result shows that the proposed approach achieves more preferable findings than the MS-GSP and the conventional SPM.

Original languageEnglish
Pages (from-to)1224-1238
Number of pages15
JournalJournal of Systems and Software
Volume86
Issue number5
DOIs
StatePublished - May 2013

Keywords

  • Data mining
  • Multiple minimum supports
  • PLWAP-tree
  • Sequential patterns

Fingerprint

Dive into the research topics of 'An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports'. Together they form a unique fingerprint.

Cite this