A formal concept analysis-based domain-specific thesaurus and its application in document representation

Jihn Chang Jehng, Shihchieh Chou, Chin Yi Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Many techniques in the process of document retrieval and clustering, based on the vector space model, represent documents by vectors. They ignore the conceptual relationships of terms such as synonyms, hypernyms and hyponyms and, especially, treat terms as a bag of terms. The application of conceptual relationships of terms has been proved by generating improved results for document clustering in previous studies. For those studies, thesauri like WordNet were used to provide the information of relationships between terms. However, some domain-specific terms like "query expansion" and "document clustering" cannot be found in these thesauri. These terms are thought of as important features in domain-specific documents. In this paper, we propose an automatic domain-specific thesaurus building approach based on Formal Concept Analysis (FCA) dealing with the problem with general thesauri. We also apply the domain-specific thesaurus as background knowledge to represent documents by concept dimension vectors. In the evaluation, an improved result by our method compared to traditional approaches is shown.

Original languageEnglish
Title of host publicationComputational Science and Its Applications - ICCSA 2010 - International Conference, Proceedings
PublisherSpringer Verlag
Pages431-442
Number of pages12
EditionPART 3
ISBN (Print)3642121780, 9783642121784
DOIs
StatePublished - 2010
Event2010 International Conference on Computational Science and Its Applications, ICCSA 2010 - Fukuoka, Japan
Duration: 23 Mar 201026 Mar 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 3
Volume6018 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2010 International Conference on Computational Science and Its Applications, ICCSA 2010
Country/TerritoryJapan
CityFukuoka
Period23/03/1026/03/10

Keywords

  • Concept Lattice
  • Formal Concept Analysis
  • Information Retrieval
  • Vector Space Model

Fingerprint

Dive into the research topics of 'A formal concept analysis-based domain-specific thesaurus and its application in document representation'. Together they form a unique fingerprint.

Cite this