A weighted cluster-based chinese text categorization approach: Incorporating with word clusters

Yu Chieh Wu, Jie Chi Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. We used hierarchical term clustering and aggregating similar terms. In order to enhance the performance, we present a modify indexing with terms in cluster. Our test collection extracted from Chinese NETNEWS, and used the Centroid-Based classifier to deal with the problems of categorization. The results had shown that term clustering is not only reducing the dimensions but also outperform than bag of words. Thus, term clustering can be applied to text classification by using any large corpus, its objective is to save times and increase the efficiency and effectiveness. In addition to performance, these clusters can be considered as conceptual knowledge base, and kept related terms of real world.

Original languageEnglish
Title of host publicationProceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012
Pages279-282
Number of pages4
DOIs
StatePublished - 2012
Event1st IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012 - Fukuoka, Japan
Duration: 20 Sep 201222 Sep 2012

Publication series

NameProceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012

Conference

Conference1st IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012
Country/TerritoryJapan
CityFukuoka
Period20/09/1222/09/12

Keywords

  • Feature selection
  • Text categorization
  • Vector space model
  • Word clustering

Fingerprint

Dive into the research topics of 'A weighted cluster-based chinese text categorization approach: Incorporating with word clusters'. Together they form a unique fingerprint.

Cite this