Evaluation via negativa of chinese word segmentation for information retrieval

Mike Tian Jian Jiang, Cheng Wei Shih, Chan Hung Kuo, Richard Tzong Han Tsai, Wen Lian Hsu

研究成果: 書貢獻/報告類型會議論文篇章同行評審

摘要

Numerous studies have analyzed the influences of word segmentation (WS) performance on information retrieval (IR) for Mandarin Chinese and have demonstrated a non-monotonic relationship between WS accuracy and IR effectiveness. The usefulness of the compound words that have been a focus of the IR literature is not reflected by common WS evaluation metrics of word-based precision (P) and recall (R). This investigation proposes alternative measurements of WS accuracy, which are based on negative segments that are annotated against four standards of referenced corpora, called true negative rate (TNR) and negative predictive value (NPV), and compares with P and R through search engine simulation,. Accuracy-controlled WS systems segment queries for the simulation including NTCIR collections and Sogou logs. Mean average precision (MAP) estimates the similarity of search results between the original and segmented queries. The statistics demonstrate that TNR and NPV are generally more closely correlated with MAP than are P and R.

原文???core.languages.en_GB???
主出版物標題PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
頁面100-109
頁數10
出版狀態已出版 - 2011
事件25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore
持續時間: 16 12月 201118 12月 2011

出版系列

名字PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
國家/地區Singapore
期間16/12/1118/12/11

指紋

深入研究「Evaluation via negativa of chinese word segmentation for information retrieval」主題。共同形成了獨特的指紋。

引用此