Description of the NCU Chinese word segmentation and named entity recognition system for SIGHAN Bakeoff 2006

Yu Chieh Wu, Jie Chi Yang, Qian Xiang Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Asian languages are far from most western-style in their non-separate word sequence especially Chinese. The preliminary step of Asian-like language processing is to find the word boundaries between words. In this paper, we present a general purpose model for both Chinese word segmentation and named entity recognition. This model was built on the word sequence classification with probability model, i.e., conditional random fields (CRF). We used a simple feature set for CRF which achieves satisfactory classification result on the two tasks. Our model achieved 91.00 in F rate in UPUC-Treebank data, and 78.71 for NER task.

Original languageEnglish
Title of host publicationCOLING/ACL 2006 - 5th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages209-212
Number of pages4
ISBN (Electronic)1932432701, 9781932432701
StatePublished - 2006
Event5th SIGHAN Workshop on Chinese Language Processing, co-located with COLING/ACL 2006 - Sydney, Australia
Duration: 22 Jul 200623 Jul 2006

Publication series

NameCOLING/ACL 2006 - 5th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop

Conference

Conference5th SIGHAN Workshop on Chinese Language Processing, co-located with COLING/ACL 2006
Country/TerritoryAustralia
CitySydney
Period22/07/0623/07/06

Fingerprint

Dive into the research topics of 'Description of the NCU Chinese word segmentation and named entity recognition system for SIGHAN Bakeoff 2006'. Together they form a unique fingerprint.

Cite this