AFIS: Aligning detail-pages for full schema induction

Oviliani Yenty Yuliana, Chia Hui Chang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

3 引文 斯高帕斯(Scopus)

摘要

Web data extraction is an essential task for web data integration. Most researches focus on data extraction from list-pages by detecting data-rich section and record boundary segmentation. However, in detail-pages which contain all-inclusive product information in each page, so the number of data attributes need to be aligned is much larger. In this paper, we formulate data extraction problem as alignment of leaf nodes from DOM Trees. We propose AFIS, Annotation-Free Induction of Full Schema for detail pages in this paper. AFIS applies Divide-and-Conquer and Longest Increasing Sequence (LIS) algorithms to mine landmarks from input. The experiments show that AFIS outperforms RoadRunner, FivaTech and TEX (F1 0.990) in terms of selected data. For full schema evaluation (all data), AFIS also represents the highest average performance (F1 0.937) compared with TEX and RoadRunner.

原文???core.languages.en_GB???
主出版物標題TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面220-227
頁數8
ISBN(電子)9781509057320
DOIs
出版狀態已出版 - 16 3月 2017
事件2016 Conference on Technologies and Applications of Artificial Intelligence, TAAI 2016 - Hsinchu, Taiwan
持續時間: 25 11月 201627 11月 2016

出版系列

名字TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???2016 Conference on Technologies and Applications of Artificial Intelligence, TAAI 2016
國家/地區Taiwan
城市Hsinchu
期間25/11/1627/11/16

指紋

深入研究「AFIS: Aligning detail-pages for full schema induction」主題。共同形成了獨特的指紋。

引用此