Description of the NCU Chinese word segmentation and named entity recognition system for SIGHAN Bakeoff 2006

Yu Chieh Wu, Jie Chi Yang, Qian Xiang Lin

研究成果: 書貢獻/報告類型會議論文篇章同行評審

4 引文 斯高帕斯(Scopus)

摘要

Asian languages are far from most western-style in their non-separate word sequence especially Chinese. The preliminary step of Asian-like language processing is to find the word boundaries between words. In this paper, we present a general purpose model for both Chinese word segmentation and named entity recognition. This model was built on the word sequence classification with probability model, i.e., conditional random fields (CRF). We used a simple feature set for CRF which achieves satisfactory classification result on the two tasks. Our model achieved 91.00 in F rate in UPUC-Treebank data, and 78.71 for NER task.

原文???core.languages.en_GB???
主出版物標題COLING/ACL 2006 - 5th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop
發行者Association for Computational Linguistics (ACL)
頁面209-212
頁數4
ISBN(電子)1932432701, 9781932432701
出版狀態已出版 - 2006
事件5th SIGHAN Workshop on Chinese Language Processing, co-located with COLING/ACL 2006 - Sydney, Australia
持續時間: 22 7月 200623 7月 2006

出版系列

名字COLING/ACL 2006 - 5th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???5th SIGHAN Workshop on Chinese Language Processing, co-located with COLING/ACL 2006
國家/地區Australia
城市Sydney
期間22/07/0623/07/06

指紋

深入研究「Description of the NCU Chinese word segmentation and named entity recognition system for SIGHAN Bakeoff 2006」主題。共同形成了獨特的指紋。

引用此