TY - JOUR
T1 - Deciphering pixel insights
T2 - A deep dive into deep learning strategies for enhanced indoor depth estimation
AU - Pinasthika, Krisna
AU - Utaminingrum, Fitri
AU - Lin, Chih‑Yang Y.
AU - Wada, Chikamune
AU - Shih, Timothy K.
N1 - Publisher Copyright:
© 2024
PY - 2024/4
Y1 - 2024/4
N2 - Depth estimation is one of the crucial tasks for autonomous systems, which provides important information about the distance between the system and its surroundings. Traditionally, Light Detection and Ranging and stereo cameras have been used for distance measurement, despite the significant cost. In contrast, monocular cameras offer a more cost-effective solution, but lack inherent depth information. The synergy of big data and deep learning has led to various advanced architectures for monocular depth estimation. However, due to the characteristics of the monocular depth estimation case that is ill posed problem, we incorporate Attention Gates (AG) within an encoder-decoder based architecture. This helps prevent pattern recognition failures caused by variations in object sizes that share identical depth values. Our research involves evaluating popular pretrained architectures, assessing the impact of using AG, and creating effective head blocks to tackle depth estimation challenges. Notably, our approach demonstrates improved evaluation metrics on the DIODE dataset, positioning Attention U-Net as a promising solution. Therefore, utilizing the superior performance obtained by Attention U-Net in performing monocular depth estimation on low-cost autonomous systems could relatively reduce the cost of using lidar or stereo cameras in measuring distance.1 https://github.com/KrisnaPinasthika/Deciphering-Pixel-Insights
AB - Depth estimation is one of the crucial tasks for autonomous systems, which provides important information about the distance between the system and its surroundings. Traditionally, Light Detection and Ranging and stereo cameras have been used for distance measurement, despite the significant cost. In contrast, monocular cameras offer a more cost-effective solution, but lack inherent depth information. The synergy of big data and deep learning has led to various advanced architectures for monocular depth estimation. However, due to the characteristics of the monocular depth estimation case that is ill posed problem, we incorporate Attention Gates (AG) within an encoder-decoder based architecture. This helps prevent pattern recognition failures caused by variations in object sizes that share identical depth values. Our research involves evaluating popular pretrained architectures, assessing the impact of using AG, and creating effective head blocks to tackle depth estimation challenges. Notably, our approach demonstrates improved evaluation metrics on the DIODE dataset, positioning Attention U-Net as a promising solution. Therefore, utilizing the superior performance obtained by Attention U-Net in performing monocular depth estimation on low-cost autonomous systems could relatively reduce the cost of using lidar or stereo cameras in measuring distance.1 https://github.com/KrisnaPinasthika/Deciphering-Pixel-Insights
KW - Attention Gates
KW - Computer vision
KW - Deep learning
KW - Fully convolutional networks
KW - Monocular depth estimation
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85187270408&partnerID=8YFLogxK
U2 - 10.1016/j.jjimei.2024.100216
DO - 10.1016/j.jjimei.2024.100216
M3 - 期刊論文
AN - SCOPUS:85187270408
SN - 2667-0968
VL - 4
JO - International Journal of Information Management Data Insights
JF - International Journal of Information Management Data Insights
IS - 1
M1 - 100216
ER -