Transliteration extraction from classical chinese buddhist literature using conditional random fields

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Extracting plausible transliterations from historical literature is a key issues in historical linguistics and other resaech fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguiatics and digial humanity researchers, this paper propose a transliteration extraction method based on the conditional random field method with the features based on the characteristics of the Chinese characters used in transliterations which are suitable to identify transliteration characters. To evaluate our method, we compiled an evaluation set from the two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also construct a baseline approach with suffix array based extraction method and phonetic similarity measurement. Our method outperforms the baseline approach a lot and the recall of our method achieves 0.9561 and the precision is 0.9444. The results show our method is very effective to extract transliterations in classical Chinese texts.

Original languageEnglish
Title of host publication27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
PublisherNational Chengchi University
Pages260-266
Number of pages7
ISBN (Electronic)9789860385670
StatePublished - 2013
Event27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013 - Taipei, Taiwan
Duration: 21 Nov 201324 Nov 2013

Publication series

Name27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27

Conference

Conference27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
Country/TerritoryTaiwan
CityTaipei
Period21/11/1324/11/13

Fingerprint

Dive into the research topics of 'Transliteration extraction from classical chinese buddhist literature using conditional random fields'. Together they form a unique fingerprint.

Cite this