SVOIS: Support vector oriented instance selection for text classification

Chih Fong Tsai, Che Wei Chang

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Automatic text classification is usually based on models constructed through learning from training examples. However, as the size of text document repositories grows rapidly, the storage requirements and computational cost of model learning is becoming ever higher. Instance selection is one solution to overcoming this limitation. The aim is to reduce the amount of data by filtering out noisy data from a given training dataset. A number of instance selection algorithms have been proposed in the literature, such as ENN, IB3, ICF, and DROP3. However, all of these methods have been developed for the k-nearest neighbor (k-NN) classifier. In addition, their performance has not been examined over the text classification domain where the dimensionality of the dataset is usually very high. The support vector machines (SVM) are core text classification techniques. In this study, a novel instance selection method, called Support Vector Oriented Instance Selection (SVOIS), is proposed. First of all, a regression plane in the original feature space is identified by utilizing a threshold distance between the given training instances and their class centers. Then, another threshold distance, between the identified data (forming the regression plane) and the regression plane, is used to decide on the support vectors for the selected instances. The experimental results based on the TechTC-100 dataset show the superior performance of SVOIS over other state-of-the-art algorithms. In particular, using SVOIS to select text documents allows the k-NN and SVM classifiers perform better than without instance selection.

Original languageEnglish
Pages (from-to)1070-1083
Number of pages14
JournalInformation Systems
Volume38
Issue number8
DOIs
StatePublished - 2013

Keywords

  • Data reduction
  • Instance selection
  • Machine learning
  • Support vector machines
  • Text classification

Fingerprint

Dive into the research topics of 'SVOIS: Support vector oriented instance selection for text classification'. Together they form a unique fingerprint.

Cite this