Description of the NCU Chinese word segmentation and part-of-speech tagging for SIGHAN Bakeoff 2007

Yu Chieh Wu, Jie Chi Yang, Yue Shi Lee

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

In Chinese, most of the language processing starts from word segmentation and part-of-speech (POS) tagging. These two steps tokenize the word from a sequence of characters and predict the syntactic labels for each segmented word. In this paper, we present two distinct sequential tagging models for the above two tasks. The first word segmentation model was basically similar to previous work which made use of conditional random fields (CRF) and set of predefined dictionaries to recognize word boundaries. Second, we revise and modify support vector machine-based chunking model to label the POS tag in the tagging task. Our method in the WS task achieves moderately rank among all participants, while in the POS tagging task, it reaches very competitive results.

Original languageEnglish
Pages161-166
Number of pages6
StatePublished - 2008
Event6th SIGHAN Workshop on Chinese Language Processing, SIGHAN 2008 - Hyderabad, India
Duration: 11 Jan 200812 Jan 2008

Conference

Conference6th SIGHAN Workshop on Chinese Language Processing, SIGHAN 2008
Country/TerritoryIndia
CityHyderabad
Period11/01/0812/01/08

Fingerprint

Dive into the research topics of 'Description of the NCU Chinese word segmentation and part-of-speech tagging for SIGHAN Bakeoff 2007'. Together they form a unique fingerprint.

Cite this