TY - GEN
T1 - Towards the Design, Quality Assessment and Explainability of Synthetic Tabular Data Generation Techniques for Metabolic Syndrome Diagnosis
AU - Manjarrés, Diana
AU - Ispizua, Begoña
AU - Niño-Adan, Iratxe
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In last years decision-making Machine Leaning (ML) approaches have evolved from traditional methods to evidence-based approaches, particularly in healthcare sector. However, sharing data with third parties raises significant security and privacy concerns. To address these issues, researchers have explored data anonymization, distributed privacypreserving data mining, and synthetic data generation (SDG). SDG, in particular, shows promise in enabling secure data sharing while preserving privacy, crucial for developing advanced AI models. This paper focuses on Metabolic Syndrome (MetS) data, a condition affecting a significant portion of the population, and investigates various synthetic tabular data generation (STDG)techniques. It evaluates the performance of an AutoML approach for predicting MetS using different percentages of synthetic data assessed through a specific evaluation framework. Moreover, presents an explainability and feature relevance analysis of the proposed STDG methods.
AB - In last years decision-making Machine Leaning (ML) approaches have evolved from traditional methods to evidence-based approaches, particularly in healthcare sector. However, sharing data with third parties raises significant security and privacy concerns. To address these issues, researchers have explored data anonymization, distributed privacypreserving data mining, and synthetic data generation (SDG). SDG, in particular, shows promise in enabling secure data sharing while preserving privacy, crucial for developing advanced AI models. This paper focuses on Metabolic Syndrome (MetS) data, a condition affecting a significant portion of the population, and investigates various synthetic tabular data generation (STDG)techniques. It evaluates the performance of an AutoML approach for predicting MetS using different percentages of synthetic data assessed through a specific evaluation framework. Moreover, presents an explainability and feature relevance analysis of the proposed STDG methods.
KW - classification
KW - machine learning
KW - metabolic syndrome
KW - synthetic data evaluation
KW - synthetic data generation
UR - http://www.scopus.com/inward/record.url?scp=85217276748&partnerID=8YFLogxK
U2 - 10.1109/BIBM62325.2024.10822178
DO - 10.1109/BIBM62325.2024.10822178
M3 - Conference contribution
AN - SCOPUS:85217276748
T3 - Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
SP - 5009
EP - 5015
BT - Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
A2 - Cannataro, Mario
A2 - Zheng, Huiru
A2 - Gao, Lin
A2 - Cheng, Jianlin
A2 - de Miranda, Joao Luis
A2 - Zumpano, Ester
A2 - Hu, Xiaohua
A2 - Cho, Young-Rae
A2 - Park, Taesung
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
Y2 - 3 December 2024 through 6 December 2024
ER -