基於已知名稱搜尋結果的網路實體辨識模型建立工具

Ya Yun Huang, Chia Hui Chang, Chia Hui Chang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

2 引文 斯高帕斯(Scopus)

摘要

Named entity recognition (NER) is of vital importance in information extraction and natural language processing. Current NER models are trained mainly on journalistic documents such as news articles. Since they have not been trained to deal with informal documents, the performance drops on Web documents, which may lack sentence structure and contain colloquial expression. Therefore, the State-of-the-art NER systems do not work well on Web documents. When users want to recognize named entity from Web documents, they certainly have to retrain the new model. Retraining a new model is labor intensive and time consuming. The preparatory work includes preparing a large set of training data, labeling named entity, selecting an appropriate segmentation, symbols unification, normalization, designing feature, preparing dictionary, and so on. Besides, users need to repeat the previous work for different languages or different recognition types. In this research, we propose a NER model generation tool for effective Web entity extraction. We propose a semi-supervised learning approach for NER model training via automatic labeling and tri-training, which makes use of unlabeled data and structured resources containing known named entities. Experiments confirmed that the use of this tool can be applied in different languages for various types of named entities. In the task of Chinese organization name extraction, the generated model can achieve 86.1% F1 score on the 38,692 sentences with 16,241 distinct names, while the performance for Japanese organization name, English organization name, Chinese location name extraction, Chinese address recognition and English address recognition can be reached 80.3%, 83.2%, 84.5%, 97.2% and 94.8% F1-measure, respectively.

貢獻的翻譯標題A tool for web NER model generation using search snippets of known entities
原文繁體中文
主出版物標題Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
編輯Sin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面148-163
頁數16
ISBN(電子)9789573079286
出版狀態已出版 - 1 10月 2015
事件27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, Taiwan
持續時間: 1 10月 20152 10月 2015

出版系列

名字Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
國家/地區Taiwan
城市Hsinchu
期間1/10/152/10/15

Keywords

  • Co-Training
  • Named Entity Recognition
  • Tri-Training

指紋

深入研究「基於已知名稱搜尋結果的網路實體辨識模型建立工具」主題。共同形成了獨特的指紋。

引用此