基於 Web 之商家景點擷取與資料庫建置

Ting Yao Kao, Hsiu Min Chuang, Chia Hui Chang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

摘要

With the increased popularity of mobile devices, local search has become a new popular service. Therefore, we need a powerful POI (Points of Interest) database to support local search. In recent years, the web has become the largest data source of POIs. With the prevalence of Internet, people will share their travel experience and information of POIs that they had been visited on social network, their blogs, and even check-in post. Besides, many companies and organizations publish their business on their own websites, resulting a large number of POIs. In this paper, we propose a POI database construction system from the immense data of the Web. Our system consists of two parts: the query-based crawler, and the POI extraction system. The goal of query-based crawler is to collect address-bearing pages (ABP) from the web as address is a good indicator of POIs. The second part is POI extraction system. We use CRF (Conditional Random Field) to train a Chinese postal address recognition model and a Chinese organization recognition model. After the extraction of addresses and POI names from ABP with these two CRF models, we then leant a model to pair an address and a POI name as a POI. Finally, we extract POI associated information for each POI to construct a complete POI data.

貢獻的翻譯標題Points of interest extraction from unstructured web
原文繁體中文
主出版物標題Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
編輯Sin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面180-195
頁數16
ISBN(電子)9789573079286
出版狀態已出版 - 1 10月 2015
事件27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, Taiwan
持續時間: 1 10月 20152 10月 2015

出版系列

名字Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
國家/地區Taiwan
城市Hsinchu
期間1/10/152/10/15

Keywords

  • Electronic map
  • Information extraction
  • POI database
  • Web crawler

指紋

深入研究「基於 Web 之商家景點擷取與資料庫建置」主題。共同形成了獨特的指紋。

引用此