Deep Learning Based Vietnamese Diacritics Restoration

Cao Hong Nga, Nguyen Khai Thinh, Pao Chi Chang, Jia Ching Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Diacritics are very important in diacritical languages, because the meaning of sentences can be changed in accordance to diacritics. Writing without diacritics makes the sentences ambiguous; however, there are several reasons make people do not write words with diacritics, such as fast typing, convenience, or texting on unsupported diacritics devices. As a result, these texts are very difficult to process on further natural language processing (NLP) tasks like machine translation, sentiment analysis, or question answering system. Therefore, diacritics restoration is critical for further usage or processing in NLP related tasks. In this study, we propose a method which combines convolutional neural network (CNN) and bidirectional gated recurrent unit (Bi-GRU) to restore diacritics. In addition, we use residual block to resolve vanishing gradient problem of recurrent neural networks. We applied the model for restoring diacritics of Vietnamese language that has the highest ratio of diacritics in words. This approach has character accuracy at 98.63% and word accuracy at 94.77%.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Symposium on Multimedia, ISM 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages331-334
Number of pages4
ISBN (Electronic)9781728156064
DOIs
StatePublished - Dec 2019
Event21st IEEE International Symposium on Multimedia, ISM 2019 - San Diego, United States
Duration: 9 Dec 201911 Dec 2019

Publication series

NameProceedings - 2019 IEEE International Symposium on Multimedia, ISM 2019

Conference

Conference21st IEEE International Symposium on Multimedia, ISM 2019
Country/TerritoryUnited States
CitySan Diego
Period9/12/1911/12/19

Keywords

  • convolutional neural network
  • diacritics
  • diacritics restoration
  • neural networks
  • recurrent neural network

Fingerprint

Dive into the research topics of 'Deep Learning Based Vietnamese Diacritics Restoration'. Together they form a unique fingerprint.

Cite this