Multi-modal Chatbot in Intelligent Manufacturing

Tzu Yu Chen, Yu Ching Chiu, Nanyi Bi, Richard Tzong Han Tsai

Research output: Contribution to journalArticlepeer-review


Artificial intelligence (AI) has been widely used in various industries. In this work, we concentrate on what AI is capable of doing in manufacturing, in the form of a chatbot. We designed a chatbot that helps users complete an assembly task that simulates those in manufacturing settings. In order to recreate this setting, we have users assemble a Meccanoid robot through multiple stages, with the help of an interactive dialogue system. Based on classifying users’ intent, the chatbot is able to provide answers or instructions to the user when the user encounters problems during the assembly process. Our goal is to improve our system so that it can capture users’ needs by detecting their intent and therefore provide relevant and helpful information to the user. However, in a multiple-step task, we cannot rely on intent classification with user question utterance as the only input, as user questions raised from different steps may share the same intent but require different responses. In this paper, we proposed two methods to address this problem. One is that we capture not only textual features but also visual features through the YOLO-based Masker with CNN (YMC) model. Another is the usage of an Autoencoder to encode multi-modal features for user intent classification. By incorporating visual information, we have significantly improved the chatbot’s performance from the experiments conducted on different dataset.

Original languageEnglish
JournalIEEE Access
StateAccepted/In press - 2021


  • Artificial intelligence
  • Chatbot
  • Manufacturing
  • Robotic assembly
  • Robots
  • Task analysis
  • Visualization
  • chatbot
  • human-robot interaction
  • multi-modal intent classification


Dive into the research topics of 'Multi-modal Chatbot in Intelligent Manufacturing'. Together they form a unique fingerprint.

Cite this