A calculation mechanism for similarity measure with clustering an unbalanced hierarchical terminology structure

Min Tzu Wang, Ping Yu Hsu, K. C. Lin, Jason Hung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The effective retrieval of reverent information often is quite useful to the user, for example, to query the respectful knowledge or information, especially for on-line e-leaner. The most common method is to make use of synonym and antonym from a dictionary with the most frequent terms. However, sometimes we are focusing on a pair of or a set of associated keywords offered by user, instead of same meaning. Generally, we would probably adopt the association rule to solve the problem. Nonetheless, the keywords or terms sets extracted from huge queries often contain sparse information composed of a wide range of keywords, with each term set only containing a few terms. These data render basket analysis with extremely low item support, lift the term to a higher level of concept hierarchy may get enough support, but missing the detailed information. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two terms in deeper subtrees are very likely to have a higher similarity than two terms in shallower subtrees. The research proposes to calculate the distance between two terms by counting the edge traversal needed yet from user s viewpoint to link them in order to solve the issues. The method is straight forward yet achieves better outcome with information query when concept hierarchy is unbalanced.

Original languageEnglish
Title of host publication2007 International Conference on Parallel Processing Workshops, ICPPW
DOIs
StatePublished - 2007
Event2007 International Conference on Parallel Processing Workshops, ICPPW 2007 - Xian, China
Duration: 10 Sep 200714 Sep 2007

Publication series

NameProceedings of the International Conference on Parallel Processing Workshops
ISSN (Print)1530-2016

Conference

Conference2007 International Conference on Parallel Processing Workshops, ICPPW 2007
Country/TerritoryChina
CityXian
Period10/09/0714/09/07

Keywords

  • Clustering
  • Data mining
  • Hierarchy
  • Similarity measure

Fingerprint

Dive into the research topics of 'A calculation mechanism for similarity measure with clustering an unbalanced hierarchical terminology structure'. Together they form a unique fingerprint.

Cite this