Developing learner corpus annotation for Chinese grammatical errors

Lung Hao Lee, Li Ping Chang, Yuen Hsien Tseng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

This study describes the construction of the TOCFL (Test Of Chinese as a Foreign Language) learner corpus, including the collection and grammatical error annotation of 2,837 essays written by Chinese language learners originating from a total of 46 different mother-Tongue languages. We propose hierarchical tagging sets to manually annotate grammatical errors, resulting in 33,835 inappropriate usages. Our built corpus has been provided for the shared tasks on Chinese grammatical error diagnosis. These demonstrate the usability of our learner corpus annotation.

Original languageEnglish
Title of host publicationProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016
EditorsMinghui Dong, Chung-Hsien Wu, Yanfeng Lu, Haizhou Li, Yuen-Hsien Tseng, Liang-Chih Yu, Lung-Hao Lee
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages254-257
Number of pages4
ISBN (Electronic)9781509009213
DOIs
StatePublished - 10 Mar 2017
Event20th International Conference on Asian Language Processing, IALP 2016 - Tainan, Taiwan
Duration: 21 Nov 201623 Nov 2016

Publication series

NameProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016

Conference

Conference20th International Conference on Asian Language Processing, IALP 2016
Country/TerritoryTaiwan
CityTainan
Period21/11/1623/11/16

Keywords

  • computer-Assisted language learning
  • error schema
  • error tagging
  • grammatical error diagnosis
  • interlanguage
  • second language acquisition

Fingerprint

Dive into the research topics of 'Developing learner corpus annotation for Chinese grammatical errors'. Together they form a unique fingerprint.

Cite this