Automatic information extraction for multiple singular web pages

Chia Hui Chang, Shih Chien Kuo, Kuo Yu Hwang, Tsung Hsin Ho, Chih Lung Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The World WideWeb is now undeniably the richest and most dense source of information, yet its structure makes it difficult to make use of that information in a systematic way. This paper extends a pattern discovery approach called IEPAD to the rapid generation of information extractors that can extract structured data from semi-structured Web documents. IEPAD is proposed to automate wrapper generation from a multiple-record Web page without user-labeled examples. In this paper, we consider another case when multiple Web pages are available but each input Web page contains only one record (called singular Web pages). To solve this case, a hierarchical multiple string alignment is proposed to allow wrapper induction for multiple singular Web pages.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 6th Pacific-Asia Conference, PAKDD 2002, Proceedings
EditorsMing-Syan Chen, Philip S. Yu, Bing Liu
PublisherSpringer Verlag
Pages297-303
Number of pages7
ISBN (Print)9783540437048
DOIs
StatePublished - 2002
Event6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002 - Taipei, Taiwan
Duration: 6 May 20028 May 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2336
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002
Country/TerritoryTaiwan
CityTaipei
Period6/05/028/05/02

Fingerprint

Dive into the research topics of 'Automatic information extraction for multiple singular web pages'. Together they form a unique fingerprint.

Cite this