This paper presents a method for automatic segmentation of tympanic membranes (TMs) from video-otoscopic images based on deep fully convolutional neural network. Built upon the UNet architecture, the proposed EAR scheme is based on three main paradigms: EfficientNet for the encoder, Attention gate for the skip connection path, and Residual blocks for the decoder. The paper also introduces a new loss function term for the neural networks to perform segmentation tasks. Particularly, we propose to integrate EfficientNet-B4 into the encoder part of the UNet. In addition, the decoder part of the proposed network is constructed based on residual blocks from ResNet architecture. By this way, the proposed approach could take advantages of the EfficientNet and ResNet architectures such as preserving efficient reception field size for the model and avoiding overfitting problem. In addition, in the skip connection path, we employ the attention gate that can handle the varieties in shapes and sizes of interested objects, which are common issues in TM regions. Moreover, for network training, we proposed a new loss function term based on the shape distance between predicted and ground truth masks, and exploited the stochastic weight averaging to avoid being trapped in local minima. We evaluate the proposed approach on a TM dataset which includes 1012 otoscopic images from patients diagnosed with and without otitis media. Experimental results show that the proposed approach achieves high segmentation performance with the average Dice similarity coefficient of 0.929, without any pre- or post-processing steps, that outperforms other state-of-the-art methods.
- Attention gate
- Tympanic membrane segmentation