Tech-Talk-Sum: fine-tuning extractive summarization and enhancing BERT text contextualization for technological talk videos

Chalothon Chootong, Timothy K. Shih

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Automatic summarization is a task to condense the data to a shorter version while preserving key informational components and the meaning of content. In this paper, we introduce Tech-Talk-Sum, which is the combination of BERT (Bidirectional Encoder Representations from Transformers) and the attention mechanism to summarize the technological talk videos. We first introduce the technology talk datasets that were constructed from YouTube including short- and long-talk videos. Second, we explored various sentence representations from BERT’s output. Using the top hidden layer to represent sentences is the best choice for our datasets. The outputs from BERT were fed forward to the Bi-LSTM network to build local context vectors. Besides, we built the document encoder layer that leverages BERT and the self-attention mechanism to express the semantics of a video caption and to form the global context vector. Third, the undirected LSTM was added to bridge the local and global sentence’s contexts to predict the sentence’s salience score. Finally, the video summaries were generated based on the scores. We trained a single unified model on long-talk video datasets. ROUGE was utilized to evaluate our proposed methods. The experimental results demonstrate that our model has generalization ability, and achieves the baselines and state-of-the-art results for both long and short videos.

原文???core.languages.en_GB???
期刊Multimedia Tools and Applications
DOIs
出版狀態已被接受 - 2022

指紋

深入研究「Tech-Talk-Sum: fine-tuning extractive summarization and enhancing BERT text contextualization for technological talk videos」主題。共同形成了獨特的指紋。

引用此