Evaluation via negativa of chinese word segmentation for information retrieval

Mike Tian Jian Jiang, Cheng Wei Shih, Chan Hung Kuo, Richard Tzong Han Tsai, Wen Lian Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Numerous studies have analyzed the influences of word segmentation (WS) performance on information retrieval (IR) for Mandarin Chinese and have demonstrated a non-monotonic relationship between WS accuracy and IR effectiveness. The usefulness of the compound words that have been a focus of the IR literature is not reflected by common WS evaluation metrics of word-based precision (P) and recall (R). This investigation proposes alternative measurements of WS accuracy, which are based on negative segments that are annotated against four standards of referenced corpora, called true negative rate (TNR) and negative predictive value (NPV), and compares with P and R through search engine simulation,. Accuracy-controlled WS systems segment queries for the simulation including NTCIR collections and Sogou logs. Mean average precision (MAP) estimates the similarity of search results between the original and segmented queries. The statistics demonstrate that TNR and NPV are generally more closely correlated with MAP than are P and R.

Original languageEnglish
Title of host publicationPACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
Pages100-109
Number of pages10
StatePublished - 2011
Event25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore
Duration: 16 Dec 201118 Dec 2011

Publication series

NamePACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

Conference

Conference25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
Country/TerritorySingapore
Period16/12/1118/12/11

Keywords

  • Information retrieval
  • True negative rate
  • Word segmentation

Fingerprint

Dive into the research topics of 'Evaluation via negativa of chinese word segmentation for information retrieval'. Together they form a unique fingerprint.

Cite this