Transliteration extraction from classical chinese buddhist literature using conditional random fields

研究成果: 書貢獻/報告類型會議論文篇章同行評審

1 引文 斯高帕斯(Scopus)

摘要

Extracting plausible transliterations from historical literature is a key issues in historical linguistics and other resaech fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguiatics and digial humanity researchers, this paper propose a transliteration extraction method based on the conditional random field method with the features based on the characteristics of the Chinese characters used in transliterations which are suitable to identify transliteration characters. To evaluate our method, we compiled an evaluation set from the two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also construct a baseline approach with suffix array based extraction method and phonetic similarity measurement. Our method outperforms the baseline approach a lot and the recall of our method achieves 0.9561 and the precision is 0.9444. The results show our method is very effective to extract transliterations in classical Chinese texts.

原文???core.languages.en_GB???
主出版物標題27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
發行者National Chengchi University
頁面260-266
頁數7
ISBN(電子)9789860385670
出版狀態已出版 - 2013
事件27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013 - Taipei, Taiwan
持續時間: 21 11月 201324 11月 2013

出版系列

名字27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
國家/地區Taiwan
城市Taipei
期間21/11/1324/11/13

指紋

深入研究「Transliteration extraction from classical chinese buddhist literature using conditional random fields」主題。共同形成了獨特的指紋。

引用此