Using position, fonts and cited references to retrieve scientific documents

Yen Liang Chen, Li Chen Cheng, Yun Ling Cheng

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

As more and more documents become available on the internet, finding documents that fit users' needs from databases containing millions of documents is becoming increasingly important. Since a scientific document is a structured text, it has some useful features that can be used to improve retrieval performance. In this work, we investigate three such features: fonts, position and cited references. While past research has used these three features individually to improve document searching, no existing research discusses how to integrate these three together to improve retrieval performance. This work first investigates the relationships among them, and then uses these three features to design a novel retrieval method based on the discovered relationships. Extensive experiments have been carried out with real scientific documents to show its effectiveness. Our empirical results show that using the location factor alone achieves the same performance as considering location and font factors simultaneously. We also observed that citation similarity is useful only when the similarity is high. Based on these two clues, we developed a method to combine the content vector and reference vector conditionally, and as a result, this integrated approach does, indeed, improve search performance.

Original languageEnglish
Pages (from-to)492-508
Number of pages17
JournalJournal of Information Science
Volume33
Issue number4
DOIs
StatePublished - Aug 2007

Keywords

  • Cited reference
  • Features
  • Font
  • Information retrieval
  • Position
  • Scientific documents
  • Text

Fingerprint

Dive into the research topics of 'Using position, fonts and cited references to retrieve scientific documents'. Together they form a unique fingerprint.

Cite this