An attention enhanced sentence feature network for subtitle extraction and summarization

Chalothon Chootong, Timothy K. Shih, Ankhtuya Ochirbat, Worapot Sommool, Yung Yu Zhuang

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


An automatic subtitle summarization of videos not only aims to tackle the problem of content overloading but can also improve the performance of video retrieval, allowing viewers to efficiently access and understand the main content of a video. However, subtitle summarization is a challenging task due to documents being composed of incomplete sentences, meaningless phrases, and informal language. In this paper, we introduce a novel multiple attention mechanism for subtitle summarization to address such issues. We take advantage of both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (Bi-LSTM) Networks to capture the critical information of the sentence that is used to identify the importance of the sentence. Based on the salient sentence score, we introduce the summary generation method to produce a summary of the video. The experiments are conducted on both subtitle documents from educational videos and text documents. To the best of our knowledge, no previous studies have applied multiple-attention mechanisms for summarizing educational videos. Besides, we experiment on two well-known text document datasets, DUC2002, and CNN/Daily Mail, to test the performance of our model. We utilize ROUGE measures for evaluating the generated summaries at 95% confidence intervals. The experimental results demonstrated that our model outperforms the baseline and state-of-the-art models on the ROUGE-1, ROUGE-2, and ROUGE-L scores.

Original languageEnglish
Article number114946
JournalExpert Systems with Applications
StatePublished - 15 Sep 2021


  • Educational video
  • Extractive summarization
  • Integration CNN-LSTM
  • Multiple attention mechanism
  • Subtitle summarization


Dive into the research topics of 'An attention enhanced sentence feature network for subtitle extraction and summarization'. Together they form a unique fingerprint.

Cite this