Automatic Punctuation Restoration for corpus in Traditional Chinese Language using Deep Learning

Yu Chieh Chao, Chia Hui Chang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Automatic Speech Recognition (ASR) technique has already been applied to several chat apps, allowing people to orally input messages instead of typing words by hand. Meanwhile, ASR techniques have also been used in the transcription of meeting minutes from audio records. However, there exist two main reasons such that ASR systems are not suitable for some formal situations: wrong words caused by erroneous recognition and lacking punctuation marks, which degrade the readability and might express wrong meaning. In our work, we expect to set up a model to automatically restore punctuation marks for the corpus generated by ASR systems; however, since lacking such labeled data for our ASR corpus, we train and test our model totally on the corresponding transcript data. This research focuses on automatic punctuation restoration for traditional Chinese language corpus using neural network model. Our results show that the bidirectional Gated Recurrent Unit (GRU) with attention mechanism outperforms other models on our punctuation restoration task when the amount of the training data is limited.

Original languageEnglish
Title of host publicationProceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages91-96
Number of pages6
ISBN (Electronic)9781665403801
DOIs
StatePublished - Dec 2020
Event25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020 - Taipei, Taiwan
Duration: 3 Dec 20205 Dec 2020

Publication series

NameProceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020

Conference

Conference25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020
Country/TerritoryTaiwan
CityTaipei
Period3/12/205/12/20

Keywords

  • Automatic punctuation restoration
  • Deep Learning

Fingerprint

Dive into the research topics of 'Automatic Punctuation Restoration for corpus in Traditional Chinese Language using Deep Learning'. Together they form a unique fingerprint.

Cite this