A sign language is one kind of visual language combing gestures, body motions and facial expressions. It serves as the major communication tool for hearing-impaired people. Besides, for those who are unable to talk normally, the sign language also provides an effective way to express themselves. Nowadays many people are actually very interested in learning a sign language for several different reasons, such as being able to help hearing-impaired people, working as the interpreter using a sign language, learning it as the second language or another skill, or using the sign language to boost the creativity or speed up the development of body motions for infants/children. Just like any language in the world, sign languages differ in areas and countries. In order to embrace hearing-impaired people to form a more closely-connected society, it is necessary to popularize the learning of sign languages with the approaches in line with local circumstances. This research aims at developing visual recognition techniques to facilitate the learning/training of Taiwan Sign Language (TSL) and providing an assisting mechanism to enable interested learners to practice TSL by themselves. Some basic terms/sentences will be selected by TSL experts and can be displayed on a monitor in the proposed system. The TSL learner can watch the videos to learn the corresponding sign language expressions in front of a camera. The captured video will be further analyzed to evaluate whether the learners practice correctly. The proposed visual analysis scheme is based on state-of-the-art deep learning techniques. The project has three main parts: (1) extracting human keypoints specifically for TSL visual recognition, (2) TSL terms/sentences recognition based on attention-based models and (3) developing light-weighted deep learning architecture for TSL learning/training system. First, we will try to employ Unity3D to construct a labelled human keypoint dataset suitable to the visual recognition of TSL. The extracted keypoints can help to avoid the need of wearing sensors or gloves. The system uses an RGB camera, instead of a depth camera, to hopefully broaden the scope of applications in the future. Next, we will implement and test attention-based models proposed in the field of natural language processing to reasonably recognize or translate some basic TSL terms/sentences selected by the experts. After the performance reaches the required accuracy, we consider light-weighted designs of deep learning to further facilitate system implementation and integration. We hope that the proposed visual processing system can correctly recognize and evaluate the expressions of TSL learners to make solid contributions to the popularization of TSL. This may also help to construct a friendlier environment for hearing-impaired people in Taiwan.
|Effective start/end date||1/08/20 → 31/07/21|
In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):