TY - GEN
T1 - Balanced-MixUp for Highly Imbalanced Medical Image Classification
AU - Galdran, Adrian
AU - Carneiro, Gustavo
AU - González Ballester, Miguel A.
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Highly imbalanced datasets are ubiquitous in medical image classification problems. In such problems, it is often the case that rare classes associated to less prevalent diseases are severely under-represented in labeled databases, typically resulting in poor performance of machine learning algorithms due to overfitting in the learning process. In this paper, we propose a novel mechanism for sampling training data based on the popular MixUp regularization technique, which we refer to as Balanced-MixUp. In short, Balanced-MixUp simultaneously performs regular (i.e., instance-based) and balanced (i.e., class-based) sampling of the training data. The resulting two sets of samples are then mixed-up to create a more balanced training distribution from which a neural network can effectively learn without incurring in heavily under-fitting the minority classes. We experiment with a highly imbalanced dataset of retinal images (55K samples, 5 classes) and a long-tail dataset of gastro-intestinal video frames (10K images, 23 classes), using two CNNs of varying representation capabilities. Experimental results demonstrate that applying Balanced-MixUp outperforms other conventional sampling schemes and loss functions specifically designed to deal with imbalanced data. Code is released at https://github.com/agaldran/balanced_mixup.
AB - Highly imbalanced datasets are ubiquitous in medical image classification problems. In such problems, it is often the case that rare classes associated to less prevalent diseases are severely under-represented in labeled databases, typically resulting in poor performance of machine learning algorithms due to overfitting in the learning process. In this paper, we propose a novel mechanism for sampling training data based on the popular MixUp regularization technique, which we refer to as Balanced-MixUp. In short, Balanced-MixUp simultaneously performs regular (i.e., instance-based) and balanced (i.e., class-based) sampling of the training data. The resulting two sets of samples are then mixed-up to create a more balanced training distribution from which a neural network can effectively learn without incurring in heavily under-fitting the minority classes. We experiment with a highly imbalanced dataset of retinal images (55K samples, 5 classes) and a long-tail dataset of gastro-intestinal video frames (10K images, 23 classes), using two CNNs of varying representation capabilities. Experimental results demonstrate that applying Balanced-MixUp outperforms other conventional sampling schemes and loss functions specifically designed to deal with imbalanced data. Code is released at https://github.com/agaldran/balanced_mixup.
KW - Imbalanced learning
KW - Long-tail image classification
UR - https://www.scopus.com/pages/publications/85116481725
U2 - 10.1007/978-3-030-87240-3_31
DO - 10.1007/978-3-030-87240-3_31
M3 - Conference contribution
AN - SCOPUS:85116481725
SN - 9783030872397
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 323
EP - 333
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2021 - 24th International Conference, Proceedings
A2 - de Bruijne, Marleen
A2 - Cattin, Philippe C.
A2 - Cotin, Stéphane
A2 - Padoy, Nicolas
A2 - Speidel, Stefanie
A2 - Zheng, Yefeng
A2 - Essert, Caroline
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2021
Y2 - 27 September 2021 through 1 October 2021
ER -