Since recognition technology has been widely used to support learners' language learning, it is necessary to have a framework that can support the implementation of anything-to-text recognition technology, such as speech-to-text recognition, image-to-text recognition, body movement-to-text recognition, emotion-to-text recognition, and location-to-text recognition, into learning designs. Therefore, in this study, we aim to review published articles related to anything-to-text recognition in language learning from 2011 to 2020 and propose an anything-to-text recognition framework. A total of 48 articles passed the selection process of this study. The results showed that most of the published articles focused on English language learning and recruited university students to participate in their studies. In addition, most of the articles aimed to foster learners' listening skills, and very few of them paid attention to writing skills. Speech-to-text recognition was commonly used to help speaking and listening skills. Image-to-text recognition was usually used to help reading and listening skills. Body movement-to-text, emotion-to-text, and location-to-text recognition technologies were rarely used; however, these also had the potential to support language learning. Based on these findings, an anything-to-text recognition framework should consist of three important layers, namely learning representations, recognition accuracy, and learning effects with regard to learners' needs and imaginations in language learning supported by recognition technologies. Furthermore, this study also highlights the features of research trends and provides suggestions for researchers in this field.
- Anything-to-text recognition framework
- Language learning
- Multimedia representations
- Recognition technology