Project Details
Description
The purpose of attribute-oriented induction (AOI) method is to find out the generalized characteristics ofdata from a relation table. Although previous methods in AOI can find out a generalized table to describe thecharacteristics of the original relation, a common drawback of these methods is that they lack a formaldefinition to measure the induction effect of their results. Due to this reason, these methods cannot guaranteethat the generalized tables found by their methods can achieve the best induction effect. Accordingly, thisstudy formally define how to measure the induction cost of a generalized tuple, on this basis, the studyfurther proposes three methods to find generalized tables from data. The first method is developed byapplying the traditional framework of agglomerative clustering algorithms. The advantage of this approach isthat, each time when it selects two generalized tuples to combine, it chooses the pair with minimum inductioncost. Doing so makes the found generalized tables having lower induction cost than those found bytraditional AOI methods. The second method is developed by applying traditional genetic algorithms (GA).Although GA-based algorithms are usually slower due to their costly evolution process, the proposed methodcan obtain generalized results very close to the optimal solution, because of its repetitive evolution process toimprove the solution. Finally, the idea of the third method is based on the observation that that data iscomposed of information and noise. If we can remove noise but keep the information in the data, it wouldresult in a better induction effect. Accordingly, the third method is a greedy method to find the minimalinduction cost generalized table under the condition that at most x% noisy data can be discarded.
Status | Finished |
---|---|
Effective start/end date | 1/08/17 → 31/07/18 |
Keywords
- Attribute-Oriented Induction
- Clustering
- Genetic algorithm (GA)
- Greedy Algorithm
- Generalized Table
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.