An evaluation of the formal concept analysis-based document vector on document clustering

Jihn Chang Jehng, Shihchieh Chou, Chin Yi Cheng, Jia Sheng Heh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.

Original languageEnglish
Title of host publicationProceedings - 2011 International Conference on Computational Science and Its Applications, ICCSA 2011
Pages207-210
Number of pages4
DOIs
StatePublished - 2011
Event11th International Conference on Computational Science and Its Applications, ICCSA 2011 - Santander, Spain
Duration: 20 Jun 201123 Jun 2011

Publication series

NameProceedings - 2011 International Conference on Computational Science and Its Applications, ICCSA 2011

Conference

Conference11th International Conference on Computational Science and Its Applications, ICCSA 2011
Country/TerritorySpain
CitySantander
Period20/06/1123/06/11

Keywords

  • Concept vector
  • Document clustering
  • Document vector
  • Formal concept analysis
  • Term ontology

Fingerprint

Dive into the research topics of 'An evaluation of the formal concept analysis-based document vector on document clustering'. Together they form a unique fingerprint.

Cite this