TY - JOUR
T1 - Non-imaging Medical Data Synthesis for Trustworthy AI
T2 - A Comprehensive Survey
AU - Xing, Xiaodan
AU - Wu, Huanjun
AU - Wang, Lichao
AU - Stenson, Iain
AU - Yong, May
AU - Del Ser, Javier
AU - Walsh, Simon
AU - Yang, Guang
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/4/9
Y1 - 2024/4/9
N2 - Data quality is a key factor in the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can improve the accuracy, robustness, and privacy of downstream AI algorithms. However, access to high-quality datasets is limited by the technical difficulties of data acquisition, and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with distributions similar to real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Therefore, in this article, we will review synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-style review article will provide comprehensive descriptions of non-imaging medical data synthesis, covering aspects such as algorithms, evaluations, limitations, and future research directions.
AB - Data quality is a key factor in the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can improve the accuracy, robustness, and privacy of downstream AI algorithms. However, access to high-quality datasets is limited by the technical difficulties of data acquisition, and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with distributions similar to real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Therefore, in this article, we will review synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-style review article will provide comprehensive descriptions of non-imaging medical data synthesis, covering aspects such as algorithms, evaluations, limitations, and future research directions.
KW - Medical data synthesis
KW - electronic healthcare records
UR - http://www.scopus.com/inward/record.url?scp=85187469162&partnerID=8YFLogxK
U2 - 10.1145/3614425
DO - 10.1145/3614425
M3 - Article
AN - SCOPUS:85187469162
SN - 0360-0300
VL - 56
SP - 1
EP - 35
JO - ACM Computing Surveys
JF - ACM Computing Surveys
IS - 7
ER -