TY - JOUR
T1 - Web-based pattern learning for named entity translation in Korean-Chinese cross-language information retrieval
AU - Wang, Yu Chun
AU - Tsai, Richard Tzong Han
AU - Hsu, Wen Lian
PY - 2009/3
Y1 - 2009/3
N2 - Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from Korean to Chinese in order to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integrate two online databases to extend the coverage of our bilingual dictionaries. We use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. We also use Naver.com's people search engine to find a query name's Chinese or English translation. The second component of our system is able to learn Korean-Chinese (K-C), Korean-English (K-E), and English-Chinese (E-C) translation patterns from the web. These patterns can be used to extract K-C, K-E and E-C pairs from Google snippets. We found KCIR performance using this hybrid configuration over five times better than that a dictionary-based configuration using only Naver people search. Mean average precision was as high as 0.3385 and recall reached 0.7578. Our method can handle Chinese, Japanese, Korean, and non-CJK NE translation and improve performance of KCIR substantially.
AB - Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from Korean to Chinese in order to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integrate two online databases to extend the coverage of our bilingual dictionaries. We use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. We also use Naver.com's people search engine to find a query name's Chinese or English translation. The second component of our system is able to learn Korean-Chinese (K-C), Korean-English (K-E), and English-Chinese (E-C) translation patterns from the web. These patterns can be used to extract K-C, K-E and E-C pairs from Google snippets. We found KCIR performance using this hybrid configuration over five times better than that a dictionary-based configuration using only Naver people search. Mean average precision was as high as 0.3385 and recall reached 0.7578. Our method can handle Chinese, Japanese, Korean, and non-CJK NE translation and improve performance of KCIR substantially.
KW - Korean-Chinese cross-language information retrieval
KW - Named entity translation
KW - Web-based pattern learning
UR - http://www.scopus.com/inward/record.url?scp=56349132625&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2008.02.067
DO - 10.1016/j.eswa.2008.02.067
M3 - 期刊論文
AN - SCOPUS:56349132625
SN - 0957-4174
VL - 36
SP - 3990
EP - 3995
JO - Expert Systems with Applications
JF - Expert Systems with Applications
IS - 2 PART 2
ER -