PTT 網站餐廳美食類別擷取之研究

Translated title of the contribution: A study of restaurant information and food type extraction from PTT

Chih Yu Chung, Chien Lung Chou, Chia Hui Chang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In this study, we hope to develop a system to automatically extract restaurant type from the FOOD board of PTT, the largest BBS web site in Taiwan. This paper is divided into three parts. The first part is pre-processing, where we crawl articles from the PTT FOOD board and extract title、restaurant name、telephone、address and URL information via regular expressions. The second part is restaurant type labeling from title data. We used WIDM NER TOOL to train a model for restaurant type extraction. The last part of the article is experiment. We randomly selected 10,000 titles for manual labeling and testing. We used the labeled data for supervised learning and included unlabeled data for Semi-Supervised learning. Finally we got a good result using this method in restaurant type extraction.

Translated title of the contributionA study of restaurant information and food type extraction from PTT
Original languageChinese (Traditional)
Title of host publicationProceedings of the 29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017
EditorsLun-Wei Ku, Yu Tsao, Chi-Chun Lee, Cheng-Zen Yang, Hung-Yi Lee, Richard T.-H. Tsai, Wen-Hsiang Lu, Shih-Hung Wu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages183-196
Number of pages14
ISBN (Electronic)9789869576901
StatePublished - 1 Nov 2017
Event29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017 - Taipei, Taiwan
Duration: 27 Nov 201728 Nov 2017

Publication series

NameProceedings of the 29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017

Conference

Conference29th Conference on Computational Linguistics and Speech Processing, ROCLING 2017
Country/TerritoryTaiwan
CityTaipei
Period27/11/1728/11/17

Fingerprint

Dive into the research topics of 'A study of restaurant information and food type extraction from PTT'. Together they form a unique fingerprint.

Cite this