In Chinese, most of the language processing starts from word segmentation and part-of-speech (POS) tagging. These two steps tokenize the word from a sequence of characters and predict the syntactic labels for each segmented word. In this paper, we present two distinct sequential tagging models for the above two tasks. The first word segmentation model was basically similar to previous work which made use of conditional random fields (CRF) and set of predefined dictionaries to recognize word boundaries. Second, we revise and modify support vector machine-based chunking model to label the POS tag in the tagging task. Our method in the WS task achieves moderately rank among all participants, while in the POS tagging task, it reaches very competitive results.
|Number of pages||6|
|State||Published - 2008|
|Event||6th SIGHAN Workshop on Chinese Language Processing, SIGHAN 2008 - Hyderabad, India|
Duration: 11 Jan 2008 → 12 Jan 2008
|Conference||6th SIGHAN Workshop on Chinese Language Processing, SIGHAN 2008|
|Period||11/01/08 → 12/01/08|