TY - JOUR
T1 - On the caveats of AI autophagy
AU - Xing, Xiaodan
AU - Shi, Fadong
AU - Huang, Jiahao
AU - Wu, Yinzhe
AU - Nan, Yang
AU - Zhang, Sheng
AU - Fang, Yingying
AU - Roberts, Michael
AU - Schönlieb, Carola Bibiane
AU - Del Ser, Javier
AU - Yang, Guang
N1 - Publisher Copyright:
© Springer Nature Limited 2025.
PY - 2025/2
Y1 - 2025/2
N2 - Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabelled synthetic data. This trend, known as the AI autophagy phenomenon, suggests a future where generative AI systems may increasingly consume their own outputs without discernment, raising concerns about model performance, reliability and ethical implications. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? To address these research questions, this Perspective examines the existing literature, delving into the consequences of AI autophagy, analysing the associated risks and exploring strategies to mitigate its impact. Our aim is to provide a comprehensive perspective on this phenomenon advocating for a balanced approach that promotes the sustainable development of generative AI technologies in the era of large models.
AB - Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabelled synthetic data. This trend, known as the AI autophagy phenomenon, suggests a future where generative AI systems may increasingly consume their own outputs without discernment, raising concerns about model performance, reliability and ethical implications. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? To address these research questions, this Perspective examines the existing literature, delving into the consequences of AI autophagy, analysing the associated risks and exploring strategies to mitigate its impact. Our aim is to provide a comprehensive perspective on this phenomenon advocating for a balanced approach that promotes the sustainable development of generative AI technologies in the era of large models.
UR - http://www.scopus.com/inward/record.url?scp=85218812286&partnerID=8YFLogxK
U2 - 10.1038/s42256-025-00984-1
DO - 10.1038/s42256-025-00984-1
M3 - Article
AN - SCOPUS:85218812286
SN - 2522-5839
VL - 7
SP - 172
EP - 180
JO - Nature Machine Intelligence
JF - Nature Machine Intelligence
IS - 2
ER -