Automatic information extraction for multiple singular web pages

Chia Hui Chang, Shih Chien Kuo, Kuo Yu Hwang, Tsung Hsin Ho, Chih Lung Lin

研究成果: 書貢獻/報告類型會議論文篇章同行評審

3 引文 斯高帕斯(Scopus)

摘要

The World WideWeb is now undeniably the richest and most dense source of information, yet its structure makes it difficult to make use of that information in a systematic way. This paper extends a pattern discovery approach called IEPAD to the rapid generation of information extractors that can extract structured data from semi-structured Web documents. IEPAD is proposed to automate wrapper generation from a multiple-record Web page without user-labeled examples. In this paper, we consider another case when multiple Web pages are available but each input Web page contains only one record (called singular Web pages). To solve this case, a hierarchical multiple string alignment is proposed to allow wrapper induction for multiple singular Web pages.

原文???core.languages.en_GB???
主出版物標題Advances in Knowledge Discovery and Data Mining - 6th Pacific-Asia Conference, PAKDD 2002, Proceedings
編輯Ming-Syan Chen, Philip S. Yu, Bing Liu
發行者Springer Verlag
頁面297-303
頁數7
ISBN(列印)9783540437048
DOIs
出版狀態已出版 - 2002
事件6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002 - Taipei, Taiwan
持續時間: 6 5月 20028 5月 2002

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
2336
ISSN(列印)0302-9743
ISSN(電子)1611-3349

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002
國家/地區Taiwan
城市Taipei
期間6/05/028/05/02

指紋

深入研究「Automatic information extraction for multiple singular web pages」主題。共同形成了獨特的指紋。

引用此