Using position, fonts and cited references to retrieve scientific documents

Yen Liang Chen, Li Chen Cheng, Yun Ling Cheng

研究成果: 雜誌貢獻期刊論文同行評審

1 引文 斯高帕斯(Scopus)


As more and more documents become available on the internet, finding documents that fit users' needs from databases containing millions of documents is becoming increasingly important. Since a scientific document is a structured text, it has some useful features that can be used to improve retrieval performance. In this work, we investigate three such features: fonts, position and cited references. While past research has used these three features individually to improve document searching, no existing research discusses how to integrate these three together to improve retrieval performance. This work first investigates the relationships among them, and then uses these three features to design a novel retrieval method based on the discovered relationships. Extensive experiments have been carried out with real scientific documents to show its effectiveness. Our empirical results show that using the location factor alone achieves the same performance as considering location and font factors simultaneously. We also observed that citation similarity is useful only when the similarity is high. Based on these two clues, we developed a method to combine the content vector and reference vector conditionally, and as a result, this integrated approach does, indeed, improve search performance.

頁(從 - 到)492-508
期刊Journal of Information Science
出版狀態已出版 - 8月 2007


深入研究「Using position, fonts and cited references to retrieve scientific documents」主題。共同形成了獨特的指紋。