An archive is a file containing several related files. Many Internet resources, such as freeware, shareware and trail software, are often packaged into archives for easy installation and taking, Additionally, thousands of users search for archives and download them from different sources everyday. In this paper, previous research on archive downloading is extended via proxy cache to support archive searching. Internet proxy cache servers are used to gather a significant number of Web pages, detect those that contain archive links, and then use the obtained data to search archives by description or filename. Two schemes, iterative and backtracking, are proposed to obtain Web pages with archive links. The experimental results indicate that the precision that both of the schemes can achieve is about the same; however, the backtracking scheme reduces the number of checked pages by a factor of 26. Finally, a real system was implemented to demonstrate the proposed approaches.
- Worldwide web