TY - JOUR
T1 - Uncertainty-aware segmentation quality prediction via deep learning Bayesian Modeling
T2 - Comprehensive evaluation and interpretation on skin cancer and liver segmentation
AU - Sikha, O. K.
AU - Riera-Marín, Meritxell
AU - Galdran, Adrian
AU - López, Javier García
AU - Rodríguez-Comas, Júlia
AU - Piella, Gemma
AU - Ballester, Miguel A.González
N1 - Publisher Copyright:
© 2025
PY - 2025/7
Y1 - 2025/7
N2 - Image segmentation is a critical step in computational biomedical image analysis, typically evaluated using metrics like the Dice coefficient during training and validation. However, in clinical settings without manual annotations, assessing segmentation quality becomes challenging, and models lacking reliability indicators face adoption barriers. To address this gap, we propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time. Our approach introduces two complementary frameworks: one leveraging predicted segmentation and uncertainty maps, and another integrating the original input image, uncertainty maps, and predicted segmentation maps. We present Bayesian adaptations of two benchmark segmentation models—SwinUNet and Feature Pyramid Network with ResNet50—using Monte Carlo Dropout, Ensemble, and Test Time Augmentation to quantify uncertainty. We evaluate four uncertainty estimates—confidence map, entropy, mutual information, and expected pairwise Kullback–Leibler divergence—on 2D skin lesion and 3D liver segmentation datasets, analyzing their correlation with segmentation quality metrics. Our framework achieves an R2 score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset, outperforming previous segmentation quality assessment methods. For 3D liver segmentation, Test Time Augmentation with entropy achieves an R2 score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness. Additionally, we propose an aggregation strategy that combines multiple uncertainty estimates into a single score per image, offering a more robust and comprehensive assessment of segmentation quality compared to evaluating each measure independently. The proposed uncertainty-aware segmentation quality prediction network is interpreted using gradient-based methods such as Grad-CAM and feature embedding analysis through UMAP. These techniques provide insights into the model's behavior and reliability, helping to assess the impact of incorporating uncertainty into the segmentation quality prediction pipeline. The code is available at: https://github.com/sikha2552/Uncertainty-Aware-Segmentation-Quality-Prediction-Bayesian-Modeling-with-Comprehensive-Evaluation-.
AB - Image segmentation is a critical step in computational biomedical image analysis, typically evaluated using metrics like the Dice coefficient during training and validation. However, in clinical settings without manual annotations, assessing segmentation quality becomes challenging, and models lacking reliability indicators face adoption barriers. To address this gap, we propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time. Our approach introduces two complementary frameworks: one leveraging predicted segmentation and uncertainty maps, and another integrating the original input image, uncertainty maps, and predicted segmentation maps. We present Bayesian adaptations of two benchmark segmentation models—SwinUNet and Feature Pyramid Network with ResNet50—using Monte Carlo Dropout, Ensemble, and Test Time Augmentation to quantify uncertainty. We evaluate four uncertainty estimates—confidence map, entropy, mutual information, and expected pairwise Kullback–Leibler divergence—on 2D skin lesion and 3D liver segmentation datasets, analyzing their correlation with segmentation quality metrics. Our framework achieves an R2 score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset, outperforming previous segmentation quality assessment methods. For 3D liver segmentation, Test Time Augmentation with entropy achieves an R2 score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness. Additionally, we propose an aggregation strategy that combines multiple uncertainty estimates into a single score per image, offering a more robust and comprehensive assessment of segmentation quality compared to evaluating each measure independently. The proposed uncertainty-aware segmentation quality prediction network is interpreted using gradient-based methods such as Grad-CAM and feature embedding analysis through UMAP. These techniques provide insights into the model's behavior and reliability, helping to assess the impact of incorporating uncertainty into the segmentation quality prediction pipeline. The code is available at: https://github.com/sikha2552/Uncertainty-Aware-Segmentation-Quality-Prediction-Bayesian-Modeling-with-Comprehensive-Evaluation-.
KW - Explainable AI
KW - Ground-truth free performance evaluation
KW - Image segmentation
KW - Uncertainty aggregate score
KW - Uncertainty quantification
UR - https://www.scopus.com/pages/publications/105002807192
U2 - 10.1016/j.compmedimag.2025.102547
DO - 10.1016/j.compmedimag.2025.102547
M3 - Article
AN - SCOPUS:105002807192
SN - 0895-6111
VL - 123
JO - Computerized Medical Imaging and Graphics
JF - Computerized Medical Imaging and Graphics
M1 - 102547
ER -