Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations

Chia Wei Wu, Richard Tzong Han Tsai, Cheng Wei Lee, Wen Lian Hsu

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


We address the problem of integrating web taxonomies from different real Internet applications. Integrating web taxonomies is to transfer instances from a source to target taxonomy. Unlike the conventional text categorization problem, in taxonomy integration, the source taxonomy contains extra information that can be used to improve the categorization. The major existing methods can be divided in two types: those that use neighboring categories to smooth the document term vector and those that consider the semantic relationship between corresponding categories of the target and source taxonomies to facilitate categorization. In contrast to the first type of approach, which only uses a flattened hierarchy for smoothing, we apply a hierarchy shrinkage algorithm to smooth child documents by their parents. We also discuss the effect of using different hierarchical levels for smoothing. To extend the second type of approach, we extract fine-grain semantic relationships, which consider the relationships between lower-level categories. In addition, we use the cosine similarity to measure the semantic relationships, which achieves better performance than existing methods. Finally, we integrate the existing approaches and the proposed methods into one machine learning model to find the best feature configuration. The results of experiments on real Internet data demonstrate that our system outperforms standard text classifiers by about 10%.

Original languageEnglish
Pages (from-to)2123-2131
Number of pages9
JournalExpert Systems with Applications
Issue number4
StatePublished - Nov 2008


  • Shrinkage algorithm
  • Text categorization
  • Web taxonomy integration


Dive into the research topics of 'Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations'. Together they form a unique fingerprint.

Cite this