The construction of a Chinese named entity tagged corpus: CNEC1.0

Cheng Wei Shih, Tzong Han Tsai, Shih Hung Wu, Chiu Chen Hsieh, Wen Lian Hsu

Research output: Contribution to conferencePaperpeer-review

Abstract

In order to build an automatic named entity recognition (NER) system for machine learning, a large tagged corpus is necessary. This paper describes the manual construction of a Chinese named entity tagged corpus (CNEC 1.0) that can be used to improve NER performance. In this project, we define five named entity tags: PER (person name), LOC (location name), ORG (organization name), LAO (location as organization), and OAL (organization as location) for named entity categories. In addition, we propose a special tag, DIFF (Difficulty), to annotate ambiguous cases during corpus construction. A, corpus-annotating procedure, a tagging tool, and an original corpus are also introduced. Finally, we demonstrate a part of our manual-tagged corpus.

Original languageEnglish
StatePublished - 2021

Fingerprint

Dive into the research topics of 'The construction of a Chinese named entity tagged corpus: CNEC1.0'. Together they form a unique fingerprint.

Cite this