Deep Learning Based Vietnamese Diacritics Restoration

Cao Hong Nga, Nguyen Khai Thinh, Pao Chi Chang, Jia Ching Wang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

4 引文 斯高帕斯(Scopus)


Diacritics are very important in diacritical languages, because the meaning of sentences can be changed in accordance to diacritics. Writing without diacritics makes the sentences ambiguous; however, there are several reasons make people do not write words with diacritics, such as fast typing, convenience, or texting on unsupported diacritics devices. As a result, these texts are very difficult to process on further natural language processing (NLP) tasks like machine translation, sentiment analysis, or question answering system. Therefore, diacritics restoration is critical for further usage or processing in NLP related tasks. In this study, we propose a method which combines convolutional neural network (CNN) and bidirectional gated recurrent unit (Bi-GRU) to restore diacritics. In addition, we use residual block to resolve vanishing gradient problem of recurrent neural networks. We applied the model for restoring diacritics of Vietnamese language that has the highest ratio of diacritics in words. This approach has character accuracy at 98.63% and word accuracy at 94.77%.

主出版物標題Proceedings - 2019 IEEE International Symposium on Multimedia, ISM 2019
發行者Institute of Electrical and Electronics Engineers Inc.
出版狀態已出版 - 12月 2019
事件21st IEEE International Symposium on Multimedia, ISM 2019 - San Diego, United States
持續時間: 9 12月 201911 12月 2019


名字Proceedings - 2019 IEEE International Symposium on Multimedia, ISM 2019


???event.eventtypes.event.conference???21st IEEE International Symposium on Multimedia, ISM 2019
國家/地區United States
城市San Diego


深入研究「Deep Learning Based Vietnamese Diacritics Restoration」主題。共同形成了獨特的指紋。