Fine-Tuning Vision Transformer for Arabic Sign Language Video Recognition on Augmented Small-Scale Dataset

Munkhjargal Gochoo, Ganzorig Batnasan, Ahmed Abdelhadi Ahmed, Munkh Erdene Otgonbold, Fady Alnajjar, Timothy K. Shih, Tan Hsu Tan, Lai Khin Wee

研究成果: 書貢獻/報告類型會議論文篇章同行評審

2 引文 斯高帕斯(Scopus)

摘要

With the rise of AI, the recognition of Sign Language (SL) through sign-to-text has gained significance in the field of computer vision and deep machine learning. However, there are only a few medium to large open datasets available for this task, as it requires a vast dataset of thousands of signs for words/phrases in different environments, which is a time-consuming and tedious process. Furthermore, there has been very little effort towards Arabic Sign Language Recognition (ArSLR). This research paper presents the results of fine-tuning the Vision Transformer (ViT) model on a small-scale in-house dataset of ArSL. The main goal is to attain satisfactory results by utilizing minimal computing power and a small dataset involving less than 10 individuals, with only one recording made for each sign in every environment. The dataset comprises 49 classes/signs, all of which were made with two hands and belong to the Level I category in terms of popularity. To enhance the dataset, three types of augmentations - translation, shear, and rotation were employed. The ViT model, pre-trained on the Kinetics dataset, was trained on the variation of augmented datasets with 2 to 40 times samples for each original video, where the training set includes original and augmented videos of 8 volunteers and the test set includes only original videos of one particular volunteer. Experimental results reveal that the combination of rotation and shear outperformed the others, achieving an accuracy of 93% on the 20 times augmented samples per class per signer dataset. We believe this study sheds light on small-scale dataset-based SLR tasks and video/action recognition in general.

原文???core.languages.en_GB???
主出版物標題2023 IEEE International Conference on Systems, Man, and Cybernetics
主出版物子標題Improving the Quality of Life, SMC 2023 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面2880-2885
頁數6
ISBN(電子)9798350337020
DOIs
出版狀態已出版 - 2023
事件2023 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2023 - Hybrid, Honolulu, United States
持續時間: 1 10月 20234 10月 2023

出版系列

名字Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
ISSN(列印)1062-922X

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???2023 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2023
國家/地區United States
城市Hybrid, Honolulu
期間1/10/234/10/23

指紋

深入研究「Fine-Tuning Vision Transformer for Arabic Sign Language Video Recognition on Augmented Small-Scale Dataset」主題。共同形成了獨特的指紋。

引用此