TY - JOUR
T1 - Personal spoken sentence retrieval using two-level feature matching and MPEG-7 audio LLDs
AU - Lin, Po Chuan
AU - Wang, Jhing Fa
AU - Wang, Jia Ching
AU - Huang, Jun Jin
PY - 2009/7
Y1 - 2009/7
N2 - Conventional spoken sentence retrieval (SSR) relies on a large-vocabulary continuousspeech recognition (LVCSR) system. This investigation proposes a feature-based speakerdependent SSR algorithm using two-level matching. Users can speak keywords as the query inputs to get the similarity ranks from a spoken sentence database. For instance, if a user is looking for a relevant personal spoken sentence, "October 12, I have a meeting in New York" in the database, then the appropriate query input could be "meeting", "New York" or "October". in the first level, a Similar Frame Tagging scheme is proposed to locate possible segments of the database sentences that are similar to the user's query utterance. in the second level, a Fine Similarity Evaluation between the query and each possible segment is performed. Based on the feature-based comparison, the proposed algorithm does not require acoustic and language models, thus our SSR algorithm is language independent. Effective feature selection is the next issue in this paper. in addition to the conventional mel frequency cepstrum coefficients (MFCCs), several MPEG-7 audio lowlevel descriptors (LLDs) are also used as the features to exploit their ability for SSR. Experimental results revealed that the retrieval performance using MPEG-7 audio LLDs was close to that of the MFCCs. The combination of MPEG-7 audio LLDs and the MFCCs could further improve the retrieval precision. Based on the feature-based matching, the proposed algorithm has the advantages of language independent and speaker dependent training free. Comparing to the original methods [10, 11], with only 0.026 ∼ 0.05 precision decrease, the addition and multiplication numbers are reduced by around a factor of lq (frame number of query). It is particularly suitable for the use in resource-limited devices.
AB - Conventional spoken sentence retrieval (SSR) relies on a large-vocabulary continuousspeech recognition (LVCSR) system. This investigation proposes a feature-based speakerdependent SSR algorithm using two-level matching. Users can speak keywords as the query inputs to get the similarity ranks from a spoken sentence database. For instance, if a user is looking for a relevant personal spoken sentence, "October 12, I have a meeting in New York" in the database, then the appropriate query input could be "meeting", "New York" or "October". in the first level, a Similar Frame Tagging scheme is proposed to locate possible segments of the database sentences that are similar to the user's query utterance. in the second level, a Fine Similarity Evaluation between the query and each possible segment is performed. Based on the feature-based comparison, the proposed algorithm does not require acoustic and language models, thus our SSR algorithm is language independent. Effective feature selection is the next issue in this paper. in addition to the conventional mel frequency cepstrum coefficients (MFCCs), several MPEG-7 audio lowlevel descriptors (LLDs) are also used as the features to exploit their ability for SSR. Experimental results revealed that the retrieval performance using MPEG-7 audio LLDs was close to that of the MFCCs. The combination of MPEG-7 audio LLDs and the MFCCs could further improve the retrieval precision. Based on the feature-based matching, the proposed algorithm has the advantages of language independent and speaker dependent training free. Comparing to the original methods [10, 11], with only 0.026 ∼ 0.05 precision decrease, the addition and multiplication numbers are reduced by around a factor of lq (frame number of query). It is particularly suitable for the use in resource-limited devices.
KW - Audio low level descriptors
KW - Feature-based comparison
KW - Matching algorithm
KW - MPEG-7
KW - Spoken sentence retrieval
UR - http://www.scopus.com/inward/record.url?scp=69549085115&partnerID=8YFLogxK
M3 - 期刊論文
AN - SCOPUS:69549085115
SN - 1016-2364
VL - 25
SP - 1221
EP - 1238
JO - Journal of Information Science and Engineering
JF - Journal of Information Science and Engineering
IS - 4
ER -