TY - GEN
T1 - Comprehensive Analysis of Different Techniques for Data Augmentation and Proposal of New Variants of BOSME and GAN
AU - Garmendia-Orbegozo, Asier
AU - Nuñez-Gonzalez, Jose David
AU - Anton Gonzalez, Miguel Angel
AU - Graña, Manuel
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - In many environments in which detection of minority class instances is critical, the available data intended for training Machine Learning models is poorly distributed. The data imbalance usually produces deterioration of the trained model by generalisation of instances belonging to minority class predicting as majority class instances. To avoid these, different techniques have been adopted in the literature and expand the original database such as Generative Adversarial Networks (GANs) or Bayesian network-based over-sampling method (BOSME). Starting from these two methods, in this work we propose three new variants of data augmentation to address data imbalance issue. We use traffic event data from three different areas of California divided in two subgroups attending their severity. Experiments show that top performance cases where reached after using our variants. The importance of data augmentation techniques as preprocessing tool has been proved as well, as a consequence of performance drop of systems in which original databases with imbalanced data where used.
AB - In many environments in which detection of minority class instances is critical, the available data intended for training Machine Learning models is poorly distributed. The data imbalance usually produces deterioration of the trained model by generalisation of instances belonging to minority class predicting as majority class instances. To avoid these, different techniques have been adopted in the literature and expand the original database such as Generative Adversarial Networks (GANs) or Bayesian network-based over-sampling method (BOSME). Starting from these two methods, in this work we propose three new variants of data augmentation to address data imbalance issue. We use traffic event data from three different areas of California divided in two subgroups attending their severity. Experiments show that top performance cases where reached after using our variants. The importance of data augmentation techniques as preprocessing tool has been proved as well, as a consequence of performance drop of systems in which original databases with imbalanced data where used.
KW - Data augmentation
KW - Data imbalance
KW - GANs
UR - http://www.scopus.com/inward/record.url?scp=85172221747&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-40725-3_13
DO - 10.1007/978-3-031-40725-3_13
M3 - Conference contribution
AN - SCOPUS:85172221747
SN - 9783031407246
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 145
EP - 155
BT - Hybrid Artificial Intelligent Systems - 18th International Conference, HAIS 2023, Proceedings
A2 - García Bringas, Pablo
A2 - Pérez García, Hilde
A2 - Martínez de Pisón, Francisco Javier
A2 - Martínez Álvarez, Francisco
A2 - Troncoso Lora, Alicia
A2 - Herrero, Álvaro
A2 - Calvo Rolle, José Luis
A2 - Quintián, Héctor
A2 - Corchado, Emilio
PB - Springer Science and Business Media Deutschland GmbH
T2 - Proceedings of the 18th International Conference on Hybrid Artificial Intelligence Systems, HAIS 2023
Y2 - 5 September 2023 through 7 September 2023
ER -