Mining features for web ner model construction based on distant learning

Chien Lung Chou, Chia Hui Chang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In this paper, we study the problem of developing a WIDM NER tool to prepare training corpus from the Web for custom named entity recognition (NER) models via distant learning. We consider two major issues including efficient automatic labelling and effective feature mining for training accurate NER models via sequence labelling technique. While the idea of collecting training sentences from search snippets via known entities (seeds) is not new, efficient automatic labelling becomes an issue when we have a large number of seeds (e.g. 500K) and sentences (e.g. 2M). The second issue regards the mining of interesting terms or k-grams as features for supervised learning. We conduct experiments on four types of entity recognition including Chinese person name, food name, location name, and point of interest (POI) to demonstrate the improvement in efficiency and effectiveness with the proposed Web NER model construction tool.

Original languageEnglish
Title of host publicationProceedings of the 2017 International Conference on Asian Language Processing, IALP 2017
EditorsRong Tong, Yue Zhang, Yanfeng Lu, Minghui Dong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages322-325
Number of pages4
ISBN (Electronic)9781538619803
DOIs
StatePublished - 2 Jul 2017
Event21st International Conference on Asian Language Processing, IALP 2017 - Singapore, Singapore
Duration: 5 Dec 20177 Dec 2017

Publication series

NameProceedings of the 2017 International Conference on Asian Language Processing, IALP 2017
Volume2018-January

Conference

Conference21st International Conference on Asian Language Processing, IALP 2017
Country/TerritorySingapore
CitySingapore
Period5/12/177/12/17

Keywords

  • Distant learning
  • Features mining
  • Scalable automatic labeling
  • Semi-supervised Learning
  • Sequence labeling

Fingerprint

Dive into the research topics of 'Mining features for web ner model construction based on distant learning'. Together they form a unique fingerprint.

Cite this