TY - JOUR
T1 - Using Hybrid Models for Action Correction in Instrument Learning Based on AI
AU - Enkhbat, Avirmed
AU - Shih, Timothy K.
AU - Gochoo, Munkhjargal
AU - Cheewaprakobkit, Pimpa
AU - Aditya, Wisnu
AU - Duy Quy, Thai
AU - Lin, Hsinchih
AU - Lin, Yu Ting
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Human action recognition has recently attracted much attention in computer vision research. Its applications are widely found in video surveillance, human-computer interaction, entertainment, and autonomous driving. In this study, we developed a system for evaluating online music performances. This system conducts experiments to assess performance of playing the erhu, the most popular traditional stringed instrument in East Asia. Mastering the erhu poses a challenge, as players often struggle to enhance their skills due to incorrect techniques and a lack of guidance, resulting in limited progress. To address this issue, we propose hybrid models based on graph convolutional networks (GCN) and temporal convolutional networks (TCN) for action recognition to capture spatial relationships between different joints or keypoints in a human skeleton, and interactions between these joints. This can assist players in identifying errors while playing the instrument. In our research, we use RGB video as input, segmenting it into individual frames. For each frame, we extract keypoints, encompassing both image and keypoint information, which serve as input data for our model. Leveraging our innovative model architecture, we achieve an impressive accuracy rate exceeding 97% across various classes of hand error modules, thus providing valuable insights into the assessment of musical performances and demonstrates the potential of AI-based solutions to enhance the learning and correction of complex human actions in interactive learning environments.
AB - Human action recognition has recently attracted much attention in computer vision research. Its applications are widely found in video surveillance, human-computer interaction, entertainment, and autonomous driving. In this study, we developed a system for evaluating online music performances. This system conducts experiments to assess performance of playing the erhu, the most popular traditional stringed instrument in East Asia. Mastering the erhu poses a challenge, as players often struggle to enhance their skills due to incorrect techniques and a lack of guidance, resulting in limited progress. To address this issue, we propose hybrid models based on graph convolutional networks (GCN) and temporal convolutional networks (TCN) for action recognition to capture spatial relationships between different joints or keypoints in a human skeleton, and interactions between these joints. This can assist players in identifying errors while playing the instrument. In our research, we use RGB video as input, segmenting it into individual frames. For each frame, we extract keypoints, encompassing both image and keypoint information, which serve as input data for our model. Leveraging our innovative model architecture, we achieve an impressive accuracy rate exceeding 97% across various classes of hand error modules, thus providing valuable insights into the assessment of musical performances and demonstrates the potential of AI-based solutions to enhance the learning and correction of complex human actions in interactive learning environments.
KW - Action recognition
KW - erhu performance evaluation
KW - graph convolutional networks (GCN)
KW - temporal convolutional networks (TCN)
UR - http://www.scopus.com/inward/record.url?scp=85203536536&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3454170
DO - 10.1109/ACCESS.2024.3454170
M3 - 期刊論文
AN - SCOPUS:85203536536
SN - 2169-3536
VL - 12
SP - 125319
EP - 125331
JO - IEEE Access
JF - IEEE Access
ER -