PTT 網站餐廳美食類別擷取之研究

Chih Yu Chung, Chien Lung Chou, Chia Hui Chang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

1 引文 斯高帕斯(Scopus)

摘要

In this study, we hope to develop a system to automatically extract restaurant type from the FOOD board of PTT, the largest BBS web site in Taiwan. This paper is divided into three parts. The first part is pre-processing, where we crawl articles from the PTT FOOD board and extract title、restaurant name、telephone、address and URL information via regular expressions. The second part is restaurant type labeling from title data. We used WIDM NER TOOL to train a model for restaurant type extraction. The last part of the article is experiment. We randomly selected 10,000 titles for manual labeling and testing. We used the labeled data for supervised learning and included unlabeled data for Semi-Supervised learning. Finally we got a good result using this method in restaurant type extraction.

貢獻的翻譯標題A study of restaurant information and food type extraction from PTT
原文繁體中文
主出版物標題Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017
編輯Lun-Wei Ku, Yu Tsao, Chi-Chun Lee, Cheng-Zen Yang, Hung-Yi Lee, Richard T.-H. Tsai, Wen-Hsiang Lu, Shih-Hung Wu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面183-196
頁數14
ISBN(電子)9789869576901
出版狀態已出版 - 1 11月 2017
事件29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017 - Taipei, Taiwan
持續時間: 27 11月 201728 11月 2017

出版系列

名字Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017
國家/地區Taiwan
城市Taipei
期間27/11/1728/11/17

Keywords

  • Distant Learning
  • Machine Learning
  • Named Entity Recognition
  • Tri-Training

指紋

深入研究「PTT 網站餐廳美食類別擷取之研究」主題。共同形成了獨特的指紋。

引用此