Archive knowledge discovery by proxy cache

Hsiang Fu Yu, Yi Ming Chen, Li Ming Tseng

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


An archive is a file containing several related files. Many Internet resources, such as freeware, shareware and trail software, are often packaged into archives for easy installation and taking, Additionally, thousands of users search for archives and download them from different sources everyday. In this paper, previous research on archive downloading is extended via proxy cache to support archive searching. Internet proxy cache servers are used to gather a significant number of Web pages, detect those that contain archive links, and then use the obtained data to search archives by description or filename. Two schemes, iterative and backtracking, are proposed to obtain Web pages with archive links. The experimental results indicate that the precision that both of the schemes can achieve is about the same; however, the backtracking scheme reduces the number of checked pages by a factor of 26. Finally, a real system was implemented to demonstrate the proposed approaches.

Original languageEnglish
Pages (from-to)34-47
Number of pages14
JournalInternet Research
Issue number1
StatePublished - 2004


  • Archives
  • Internet
  • Worldwide web


Dive into the research topics of 'Archive knowledge discovery by proxy cache'. Together they form a unique fingerprint.

Cite this