FiVaTech: Page-level web data extraction from template pages

Mohammed Kayed, Khaled Shaalan, Chia Hui Chang, Moheb Ramzy Girgis

研究成果: 書貢獻/報告類型會議論文篇章同行評審

4 引文 斯高帕斯(Scopus)

摘要

In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema and templates for the input pages generated from a CGI program. FiVaTech uses tree templates to model the generation of dynamic Web pages. FiVaTech can deduce the schema and templates for each individual Deep Web site, which contains either singleton or multiple data records in one Web page. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.

原文???core.languages.en_GB???
主出版物標題ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
頁面15-20
頁數6
DOIs
出版狀態已出版 - 2007
事件17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007 - Omaha, NE, United States
持續時間: 28 10月 200731 10月 2007

出版系列

名字Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN(列印)1550-4786

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
國家/地區United States
城市Omaha, NE
期間28/10/0731/10/07

指紋

深入研究「FiVaTech: Page-level web data extraction from template pages」主題。共同形成了獨特的指紋。

引用此