Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations

Chia Wei Wu, Richard Tzong Han Tsai, Cheng Wei Lee, Wen Lian Hsu

研究成果: 雜誌貢獻期刊論文同行評審

3 引文 斯高帕斯(Scopus)


We address the problem of integrating web taxonomies from different real Internet applications. Integrating web taxonomies is to transfer instances from a source to target taxonomy. Unlike the conventional text categorization problem, in taxonomy integration, the source taxonomy contains extra information that can be used to improve the categorization. The major existing methods can be divided in two types: those that use neighboring categories to smooth the document term vector and those that consider the semantic relationship between corresponding categories of the target and source taxonomies to facilitate categorization. In contrast to the first type of approach, which only uses a flattened hierarchy for smoothing, we apply a hierarchy shrinkage algorithm to smooth child documents by their parents. We also discuss the effect of using different hierarchical levels for smoothing. To extend the second type of approach, we extract fine-grain semantic relationships, which consider the relationships between lower-level categories. In addition, we use the cosine similarity to measure the semantic relationships, which achieves better performance than existing methods. Finally, we integrate the existing approaches and the proposed methods into one machine learning model to find the best feature configuration. The results of experiments on real Internet data demonstrate that our system outperforms standard text classifiers by about 10%.

頁(從 - 到)2123-2131
期刊Expert Systems with Applications
出版狀態已出版 - 11月 2008


深入研究「Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations」主題。共同形成了獨特的指紋。