TY - GEN
T1 - Transliteration extraction from classical chinese buddhist literature using conditional random fields
AU - Wang, Yu Chun
AU - Tzong-Han Tsai, Richard T.H.
N1 - Publisher Copyright:
© 2013 by Yu-Chun Wang and Richard Tzong-Han Tsai.
PY - 2013
Y1 - 2013
N2 - Extracting plausible transliterations from historical literature is a key issues in historical linguistics and other resaech fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguiatics and digial humanity researchers, this paper propose a transliteration extraction method based on the conditional random field method with the features based on the characteristics of the Chinese characters used in transliterations which are suitable to identify transliteration characters. To evaluate our method, we compiled an evaluation set from the two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also construct a baseline approach with suffix array based extraction method and phonetic similarity measurement. Our method outperforms the baseline approach a lot and the recall of our method achieves 0.9561 and the precision is 0.9444. The results show our method is very effective to extract transliterations in classical Chinese texts.
AB - Extracting plausible transliterations from historical literature is a key issues in historical linguistics and other resaech fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguiatics and digial humanity researchers, this paper propose a transliteration extraction method based on the conditional random field method with the features based on the characteristics of the Chinese characters used in transliterations which are suitable to identify transliteration characters. To evaluate our method, we compiled an evaluation set from the two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also construct a baseline approach with suffix array based extraction method and phonetic similarity measurement. Our method outperforms the baseline approach a lot and the recall of our method achieves 0.9561 and the precision is 0.9444. The results show our method is very effective to extract transliterations in classical Chinese texts.
UR - http://www.scopus.com/inward/record.url?scp=84922806508&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:84922806508
T3 - 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
SP - 260
EP - 266
BT - 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
PB - National Chengchi University
T2 - 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
Y2 - 21 November 2013 through 24 November 2013
ER -