Self-Defined Text-Dependent Wake-Up-Words Speaker Recognition System

Tsung Han Tsai, Ping Cheng Hao, Chiao Li Wang

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


In recent years, wake-up-words (WUW) technology is highly developed in some speaker recognition system. It is the progress of verifying a person's claimed identity from their voice characteristics, and can be efficiently deployed in some consumer applications. In this paper, we proposed a self-defined text-dependent wake-up-words (WUW) speaker recognition system and its implementation. The whole system is divided into two phases: training phase and testing phase. In the training phase, a wake-up word by language is recorded, and the voice segment is cut out by using Voice Activity Detection (VAD). Then we use the Mel-Frequency Cepstral Coefficients (MFCC) as the pre-processing to extract the speech features. After obtaining the speech features, we use Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) simultaneously for training. In the testing phase, we build GMM and HMM continuously and use the Levenshtein Distance (LD) to calculate the differences of the state sequences between the dataset and the unknown speech input. If the unknown speech input passes the threshold, then it means a wake-up event is derived. The experimental results show that the average accuracy is 93.31 %, 82.42% and 3.38 % in 10dB, 5dB and 0dB of Signal Noise Ratio (SNR) respectively. The CPU and memory usage of entire system is around 757 MIPS and 40MB respectively.

Original languageEnglish
Pages (from-to)138668-138676
Number of pages9
JournalIEEE Access
StatePublished - 2021


  • customized wake-up word
  • Gaussian mixture model
  • hidden Markov model
  • mel-frequency cepstral coefficients
  • real-time operation
  • Speaker recognition


Dive into the research topics of 'Self-Defined Text-Dependent Wake-Up-Words Speaker Recognition System'. Together they form a unique fingerprint.

Cite this