Abstract
A semi-supervised information extraction (IE) system, OLERA (On-Line Extraction Rule Analysis), proposed by Chia-Hui Chang of National Central University, Taiwan and Shih-Chien Kou, Trend Micro, Taiwan, is described. The system allows users, with minimal effort, train extraction rules from semistructured Web pages without requiring detailed annotation of the training documents. OLERA offers visual interaction by displaying discovered records in a spreadsheet-like table for schema assignment. It performs well for program-generated Web pages with few training pages and limited user intervention.
Original language | English |
---|---|
Pages (from-to) | 56-64 |
Number of pages | 9 |
Journal | IEEE Intelligent Systems |
Volume | 19 |
Issue number | 6 |
DOIs | |
State | Published - Nov 2004 |