TY - GEN
T1 - A maximum entropy approach to Chinese grapheme-to-phoneme conversion
AU - Tsai, Richard Tzong Han
AU - Wang, Yu Chun
PY - 2009
Y1 - 2009
N2 - Grapheme-to-phoneme (G2P) conversion plays an important role in speech synthesis. The main difficulty facing Chinese G2P conversion is that many Chinese characters are polyphonic, having more than one pronunciation. A Chinese G2P system must be able to pick the correct pronunciation from among several candidates. Contextual information on neighboring characters such as character n-grams, phonetic information, or position of the polyphone in a word or sentence is the key to correct prediction. Most previous works employed rule-based or rule-learning methods, which often suffered from data sparseness. In this paper, we propose a novel G2P approach to avoid data sparseness. Our method uses the maximum entropy (ME) model framework to represent contextual information as ME features. Our system achieves a top accuracy of 99.84%, which is significantly higher than other state-of-the-art rule-based and rule-learning methods. In addition, our approach consistently improves accuracy regardless of a character's main pronunciation ratio. Further analysis also shows that the ME model is fast and efficient, requiring much less training and labeling time.
AB - Grapheme-to-phoneme (G2P) conversion plays an important role in speech synthesis. The main difficulty facing Chinese G2P conversion is that many Chinese characters are polyphonic, having more than one pronunciation. A Chinese G2P system must be able to pick the correct pronunciation from among several candidates. Contextual information on neighboring characters such as character n-grams, phonetic information, or position of the polyphone in a word or sentence is the key to correct prediction. Most previous works employed rule-based or rule-learning methods, which often suffered from data sparseness. In this paper, we propose a novel G2P approach to avoid data sparseness. Our method uses the maximum entropy (ME) model framework to represent contextual information as ME features. Our system achieves a top accuracy of 99.84%, which is significantly higher than other state-of-the-art rule-based and rule-learning methods. In addition, our approach consistently improves accuracy regardless of a character's main pronunciation ratio. Further analysis also shows that the ME model is fast and efficient, requiring much less training and labeling time.
KW - Chinese grapheme-to-phoneme conversion
KW - Maximum entropy model
KW - Speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=70449408051&partnerID=8YFLogxK
U2 - 10.1109/IRI.2009.5211588
DO - 10.1109/IRI.2009.5211588
M3 - 會議論文篇章
AN - SCOPUS:70449408051
SN - 9781424441167
T3 - 2009 IEEE International Conference on Information Reuse and Integration, IRI 2009
SP - 411
EP - 416
BT - 2009 IEEE International Conference on Information Reuse and Integration, IRI 2009
T2 - 2009 IEEE International Conference on Information Reuse and Integration, IRI 2009
Y2 - 10 August 2009 through 12 August 2009
ER -