Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction

Chien Lung Chou, Chia Hui Chang, Shin Yi Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Named entity extraction is a fundamental task for many knowledge engineering applications. Existing studies rely on annotated training data, which is quite expensive when used to obtain large data sets, limiting the effectiveness of recognition. In this research, we propose an automatic labeling procedure to prepare training data from structured resources which contain known named entities. While this automatically labeled training data may contain noise, a self-testing procedure may be used as a follow-up to remove low-confidence annotation and increase the extraction performance with less training data. In addition to the preparation of labeled training data, we also employed semi-supervised learning to utilize large unlabeled training data. By modifying tri-training for sequence labeling and deriving the proper initialization, we can further improve entity extraction. In the task of Chinese personal name extraction with 364,685 sentences (8,672 news articles) and 54,449 (11,856 distinct) person names, an F-measure of 90.4% can be achieved.

Original languageEnglish
Title of host publicationSWAIE 2014 - 3rd Workshop on SemanticWeb and Information Extraction, Proceedings of the Workshop
EditorsDiana Maynard, Marieke van Erp, Brian Davis
PublisherAssociation for Computational Linguistics (ACL)
Pages33-40
Number of pages8
ISBN (Electronic)9781873769485
StatePublished - 2014
Event3rd Workshop on SemanticWeb and Information Extraction, SWAIE 2014 - Dublin, Ireland
Duration: 24 Aug 2014 → …

Publication series

NameSWAIE 2014 - 3rd Workshop on SemanticWeb and Information Extraction, Proceedings of the Workshop

Conference

Conference3rd Workshop on SemanticWeb and Information Extraction, SWAIE 2014
Country/TerritoryIreland
CityDublin
Period24/08/14 → …

Fingerprint

Dive into the research topics of 'Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction'. Together they form a unique fingerprint.

Cite this