Efficient and robust phrase chunking using support vector machines

Yu Chieh Wu, Jie Chi Yang, Yue Shi Lee, Show Jane Yen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Automatic text chunking is a task which aims to recognize phrase structures in natural language text. It is the key technology of knowledge-based system where phrase structures provide important syntactic information for knowledge representation. Support Vector Machine (SVM-based) phrase chunking system had been shown to achieve high performance for text chunking. But its inefficiency limits the actual use on large dataset that only handles several thousands tokens per second. In this paper, we firstly show that the state-of-the-art performance (94.25) in the CoNLL-2000 shared task based on conventional SVM learning. However, the off-the-shelf SVM classifiers are inefficient when the number of phrase types scales to high. Therefore, we present two novel methods that make the system substantially faster in terms of training and testing while only results in a slightly decrease of system performance. Experimental result shows that our method achieves 94.09 in F rate, which handles 13000 tokens per second in the CoNLL-2000 chunking task.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - Third Asia Information Retrieval Symposium, AIRS 2006, Proceedings
PublisherSpringer Verlag
Pages350-361
Number of pages12
ISBN (Print)3540457801, 9783540457800
DOIs
StatePublished - 2006
Event3rd Asia Information Retrieval Symposium, AIRS 2006 - Singapore, Singapore
Duration: 16 Oct 200618 Oct 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4182 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd Asia Information Retrieval Symposium, AIRS 2006
Country/TerritorySingapore
CitySingapore
Period16/10/0618/10/06

Fingerprint

Dive into the research topics of 'Efficient and robust phrase chunking using support vector machines'. Together they form a unique fingerprint.

Cite this