TY - JOUR
T1 - A similarity-based method for retrieving documents from the SCI/SSCI database
AU - Chen, Yen Liang
AU - Wei, Jhong Jhih
AU - Wu, Shin Yi
AU - Hu, Ya Han
PY - 2006/10
Y1 - 2006/10
N2 - As more and more documents become electronically available, finding documents in large databases that fit users' needs is becoming increasingly important. In the past, the document search problem was dealt with using the database query approach or the text-based search approach. In this paper, we investigate this problem, focusing on the SCI/SSCI databases from ISI. Specifically, we design our search methodology based on the four fields commonly seen in a scientific research document: abstract, title, keywords, and reference list. Of these four, only the abstract field can be viewed as a normal text, while the other three have their own characteristics to differentiate them from texts. Therefore, we first develop a method to compute the similarity value for each field. Our next problem is combining the four similarity values into a final value. One approach is to assign weights to each and compute the weighted sum. We have not adopted this simple weighting method, however, because it is difficult to determine appropriate weights. Instead, we use the back propagation neural network to combine them. Finally, extensive experiments have been carried out using real documents drawn from TKDE journal, and the results indicate that in all situations our method has a much higher accuracy than the traditional text-based search approach.
AB - As more and more documents become electronically available, finding documents in large databases that fit users' needs is becoming increasingly important. In the past, the document search problem was dealt with using the database query approach or the text-based search approach. In this paper, we investigate this problem, focusing on the SCI/SSCI databases from ISI. Specifically, we design our search methodology based on the four fields commonly seen in a scientific research document: abstract, title, keywords, and reference list. Of these four, only the abstract field can be viewed as a normal text, while the other three have their own characteristics to differentiate them from texts. Therefore, we first develop a method to compute the similarity value for each field. Our next problem is combining the four similarity values into a final value. One approach is to assign weights to each and compute the weighted sum. We have not adopted this simple weighting method, however, because it is difficult to determine appropriate weights. Instead, we use the back propagation neural network to combine them. Finally, extensive experiments have been carried out using real documents drawn from TKDE journal, and the results indicate that in all situations our method has a much higher accuracy than the traditional text-based search approach.
KW - Information retrieval
KW - Neural network
KW - Similarity
UR - http://www.scopus.com/inward/record.url?scp=33749835218&partnerID=8YFLogxK
U2 - 10.1177/0165551506065814
DO - 10.1177/0165551506065814
M3 - 期刊論文
AN - SCOPUS:33749835218
SN - 0165-5515
VL - 32
SP - 449
EP - 464
JO - Journal of Information Science
JF - Journal of Information Science
IS - 5
ER -