基於 Web 之商家景點擷取與資料庫建置

Translated title of the contribution: Points of interest extraction from unstructured web

Ting Yao Kao, Hsiu Min Chuang, Chia Hui Chang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the increased popularity of mobile devices, local search has become a new popular service. Therefore, we need a powerful POI (Points of Interest) database to support local search. In recent years, the web has become the largest data source of POIs. With the prevalence of Internet, people will share their travel experience and information of POIs that they had been visited on social network, their blogs, and even check-in post. Besides, many companies and organizations publish their business on their own websites, resulting a large number of POIs. In this paper, we propose a POI database construction system from the immense data of the Web. Our system consists of two parts: the query-based crawler, and the POI extraction system. The goal of query-based crawler is to collect address-bearing pages (ABP) from the web as address is a good indicator of POIs. The second part is POI extraction system. We use CRF (Conditional Random Field) to train a Chinese postal address recognition model and a Chinese organization recognition model. After the extraction of addresses and POI names from ABP with these two CRF models, we then leant a model to pair an address and a POI name as a POI. Finally, we extract POI associated information for each POI to construct a complete POI data.

Translated title of the contributionPoints of interest extraction from unstructured web
Original languageChinese (Traditional)
Title of host publicationProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
EditorsSin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages180-195
Number of pages16
ISBN (Electronic)9789573079286
StatePublished - 1 Oct 2015
Event27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, Taiwan
Duration: 1 Oct 20152 Oct 2015

Publication series

NameProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015

Conference

Conference27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
Country/TerritoryTaiwan
CityHsinchu
Period1/10/152/10/15

Fingerprint

Dive into the research topics of 'Points of interest extraction from unstructured web'. Together they form a unique fingerprint.

Cite this