Unsupervised overlapping feature selection for conditional random fields learning in Chinese word segmentation

Ting Hao Yang, Tian Jian Jiang, Chan Hung Kuo, Richard Tzong Han Tsai, Wen Lian Hsu

研究成果: 會議貢獻類型會議論文同行評審

2 引文 斯高帕斯(Scopus)

摘要

This work represents several unsupervised feature selections based on frequent strings that help improve conditional random fields (CRF) model for Chinese word segmentation (CWS). These features include character-based N-gram (CNG), Accessor Variety based string (AVS), and Term Contributed Frequency (TCF) with a specific manner of boundary overlapping. For the experiment, the baseline is the 6-tag, a state-of-the-art labeling scheme of CRF-based CWS; and the data set is acquired from SIGHAN CWS bakeoff 2005. The experiment results show that all of those features improve our system's F1 measure (F) and Recall of Out-of-Vocabulary (ROOV). In particular, the feature collections which contain AVS feature outperform other types of features in terms of F, whereas the feature collections containing TCB/TCF information has better ROOV.

原文???core.languages.en_GB???
頁面109-122
頁數14
出版狀態已出版 - 2011
事件23rd Conference on Computational Linguistics and Speech Processing, ROCLING 2011 - Taipei, Taiwan
持續時間: 8 9月 20119 9月 2011

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???23rd Conference on Computational Linguistics and Speech Processing, ROCLING 2011
國家/地區Taiwan
城市Taipei
期間8/09/119/09/11

指紋

深入研究「Unsupervised overlapping feature selection for conditional random fields learning in Chinese word segmentation」主題。共同形成了獨特的指紋。

引用此