Video captioning based on joint image-audio deep learning techniques

Chien Yao Wang, Pei Sin Liaw, Kai Wen Liang, Jai Ching Wang, Pao Chi Chang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

2 引文 斯高帕斯(Scopus)

摘要

With the advancement in technology, deep learning has been widely used for various multimedia applications. Herein, we utilized this technology to video captioning. The proposed system uses different neural networks to extract features from image, audio, and semantic signals. Image and audio features are concatenated before being fed into a long short-term memory (LSTM) for initialization. The joint audio-image features help the entire semantics to form a network with better performance.A bilingual evaluation understudy algorithm (BLEU) - an automatic speech scoring mechanism - was used to score sentences. We considered the length of the word group (one word to four words); with the increase of all BLEU scores by more than 1%, the CIDEr-D score increased by 2.27%, and the METEOR and ROUGE-L scores increased by 0.2% and 0.7%, respectively. The improvement is highly significant.

原文???core.languages.en_GB???
主出版物標題Proceedings - 2019 IEEE 9th International Conference on Consumer Electronics, ICCE-Berlin 2019
編輯Gordan Velikic, Christian Gross
發行者IEEE Computer Society
頁面127-131
頁數5
ISBN(電子)9781728127453
DOIs
出版狀態已出版 - 9月 2019
事件9th IEEE International Conference on Consumer Electronics, ICCE-Berlin 2019 - Berlin, Germany
持續時間: 8 9月 201911 9月 2019

出版系列

名字IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin
2019-September
ISSN(列印)2166-6814
ISSN(電子)2166-6822

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???9th IEEE International Conference on Consumer Electronics, ICCE-Berlin 2019
國家/地區Germany
城市Berlin
期間8/09/1911/09/19

指紋

深入研究「Video captioning based on joint image-audio deep learning techniques」主題。共同形成了獨特的指紋。

引用此