Automatic summarization is a task to condense the data to a shorter version while preserving key informational components and the meaning of content. In this paper, we introduce Tech-Talk-Sum, which is the combination of BERT (Bidirectional Encoder Representations from Transformers) and the attention mechanism to summarize the technological talk videos. We first introduce the technology talk datasets that were constructed from YouTube including short- and long-talk videos. Second, we explored various sentence representations from BERT’s output. Using the top hidden layer to represent sentences is the best choice for our datasets. The outputs from BERT were fed forward to the Bi-LSTM network to build local context vectors. Besides, we built the document encoder layer that leverages BERT and the self-attention mechanism to express the semantics of a video caption and to form the global context vector. Third, the undirected LSTM was added to bridge the local and global sentence’s contexts to predict the sentence’s salience score. Finally, the video summaries were generated based on the scores. We trained a single unified model on long-talk video datasets. ROUGE was utilized to evaluate our proposed methods. The experimental results demonstrate that our model has generalization ability, and achieves the baselines and state-of-the-art results for both long and short videos.