The construction of a Chinese named entity tagged corpus: CNEC1.0

Cheng Wei Shih, Tzong Han Tsai, Shih Hung Wu, Chiu Chen Hsieh, Wen Lian Hsu

研究成果: 會議貢獻類型會議論文同行評審

摘要

In order to build an automatic named entity recognition (NER) system for machine learning, a large tagged corpus is necessary. This paper describes the manual construction of a Chinese named entity tagged corpus (CNEC 1.0) that can be used to improve NER performance. In this project, we define five named entity tags: PER (person name), LOC (location name), ORG (organization name), LAO (location as organization), and OAL (organization as location) for named entity categories. In addition, we propose a special tag, DIFF (Difficulty), to annotate ambiguous cases during corpus construction. A, corpus-annotating procedure, a tagging tool, and an original corpus are also introduced. Finally, we demonstrate a part of our manual-tagged corpus.

原文???core.languages.en_GB???
出版狀態已出版 - 2021

指紋

深入研究「The construction of a Chinese named entity tagged corpus: CNEC1.0」主題。共同形成了獨特的指紋。

引用此