Automatic Speaker Localization in Conference Based on Yolox-Tiny and TDOA

Chen Chiung Hsieh, Meng Ju Lu, You Zhan Zheng, Hsiao Ting Tseng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In remote conferences, it is often necessary to adjust the angle of the camera so that only the speaker's part remains in the picture. However, most of the current methods of adjusting the camera are done manually. This study uses the YOLOX-tiny neural network to detect the mouth movements and upper body positions of everyone in the picture captured by the camera in time, and cooperates with TDOA to detect the direction of the sound source to enhance the accuracy of the detection results. The recall of YOLOX-tiny is 93%, the recall of TDOA is 88%, the recall of only using video for speaker positioning is 77%, and the recall of integrating video and sound is about 80.3%, which can quickly and effectively retain the speaker's picture.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages161-162
Number of pages2
ISBN (Electronic)9781665470506
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022 - Taipei, Taiwan
Duration: 6 Jul 20228 Jul 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022

Conference

Conference2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022
Country/TerritoryTaiwan
CityTaipei
Period6/07/228/07/22

Fingerprint

Dive into the research topics of 'Automatic Speaker Localization in Conference Based on Yolox-Tiny and TDOA'. Together they form a unique fingerprint.

Cite this