For reducing requirement of large memory and minimizing computation complexity in large-vocabulary continuous speech recognition system, speech segmentation plays an important role in speech recognition systems. In this paper, we formulate the speech segmentation as a two-phase problem. Phase 1 (frame labeling) involves labeling frames of speech data. Frames are classified into three types: (1) silence; (2) consonants and (3) vowels according to two segmentation features. In phase 2 (syllabic unit segmentation) we apply the concept of transition states to segment continuous speech data into syllabic units based on the labeled frames. The novel class of hyperrectangular composite neural networks (HRCNN's) is used to cluster frames. The HRCNN's integrate the rule-based approach and neural network paradigms, therefore, this special hybrid system may neutralize the disadvantages of each alternative. The parameters in the trained HRCNN's are utilized to extract both crispy and fuzzy classification rules. Four speaker's continuous reading-rate Mandarin speech are given to illustrate the proposed two-phase speech segmentation model. In our experiments, the performance of the HRCNN's is better than the 'Distributed Fuzzy Rule' approach based on the comparisons of the number of rules and the correct recognition rate.
|Number of pages||8|
|State||Published - 1995|
|Event||Proceedings of the 1995 IEEE International Conference on Fuzzy Systems. Part 1 (of 5) - Yokohama, Jpn|
Duration: 20 Mar 1995 → 24 Mar 1995
|Conference||Proceedings of the 1995 IEEE International Conference on Fuzzy Systems. Part 1 (of 5)|
|Period||20/03/95 → 24/03/95|