FiVaTech: Page-level web data extraction from template pages

Mohammed Kayed, Khaled Shaalan, Chia Hui Chang, Moheb Ramzy Girgis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema and templates for the input pages generated from a CGI program. FiVaTech uses tree templates to model the generation of dynamic Web pages. FiVaTech can deduce the schema and templates for each individual Deep Web site, which contains either singleton or multiple data records in one Web page. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.

Original languageEnglish
Title of host publicationICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
Pages15-20
Number of pages6
DOIs
StatePublished - 2007
Event17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007 - Omaha, NE, United States
Duration: 28 Oct 200731 Oct 2007

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
Country/TerritoryUnited States
CityOmaha, NE
Period28/10/0731/10/07

Fingerprint

Dive into the research topics of 'FiVaTech: Page-level web data extraction from template pages'. Together they form a unique fingerprint.

Cite this