Summarization on produced type of video data (like news or movies) is to find important segments that contain rich information. Users could obtain the important messages by reading summaries rather than full documents. The researches in this area could be divided into two parts: (1) Image Processing (IP) perspective, and (2) NLP (Nature Language Processing) perspective. The former put emphasis on the detection of key frames, while the later focused on the extraction of important concepts. This paper proposes a video summarization system, VSUM. VSUM first identifies all caption words, and then adopts a technique to find the important segments. An external thesaurus is also used in VSUM to enhance the summary extraction process. The experimental results show that VSUM could perform well even if the accuracy of OCR (Optical Character Recognition) is not sophisticating.