TY - JOUR
T1 - On generating trustworthy counterfactual explanations
AU - Del Ser, Javier
AU - Barredo-Arrieta, Alejandro
AU - Díaz-Rodríguez, Natalia
AU - Herrera, Francisco
AU - Saranti, Anna
AU - Holzinger, Andreas
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2024/1
Y1 - 2024/1
N2 - Deep learning models like chatGPT exemplify AI success but necessitate a deeper understanding of trust in critical sectors. Trust can be achieved using counterfactual explanations, which is how humans become familiar with unknown processes; by understanding the hypothetical input circumstances under which the output changes. We argue that the generation of counterfactual explanations requires several aspects of the generated counterfactual instances, not just their counterfactual ability. We present a framework for generating counterfactual explanations that formulate its goal as a multiobjective optimization problem balancing three objectives: plausibility; the intensity of changes; and adversarial power. We use a generative adversarial network to model the distribution of the input, along with a multiobjective counterfactual discovery solver balancing these objectives. We demonstrate the usefulness of six classification tasks with image and 3D data confirming with evidence the existence of a trade-off between the objectives, the consistency of the produced counterfactual explanations with human knowledge, and the capability of the framework to unveil the existence of concept-based biases and misrepresented attributes in the input domain of the audited model. Our pioneering effort shall inspire further work on the generation of plausible counterfactual explanations in real-world scenarios where attribute-/concept-based annotations are available for the domain under analysis.
AB - Deep learning models like chatGPT exemplify AI success but necessitate a deeper understanding of trust in critical sectors. Trust can be achieved using counterfactual explanations, which is how humans become familiar with unknown processes; by understanding the hypothetical input circumstances under which the output changes. We argue that the generation of counterfactual explanations requires several aspects of the generated counterfactual instances, not just their counterfactual ability. We present a framework for generating counterfactual explanations that formulate its goal as a multiobjective optimization problem balancing three objectives: plausibility; the intensity of changes; and adversarial power. We use a generative adversarial network to model the distribution of the input, along with a multiobjective counterfactual discovery solver balancing these objectives. We demonstrate the usefulness of six classification tasks with image and 3D data confirming with evidence the existence of a trade-off between the objectives, the consistency of the produced counterfactual explanations with human knowledge, and the capability of the framework to unveil the existence of concept-based biases and misrepresented attributes in the input domain of the audited model. Our pioneering effort shall inspire further work on the generation of plausible counterfactual explanations in real-world scenarios where attribute-/concept-based annotations are available for the domain under analysis.
KW - Counterfactual explanations
KW - Deep learning
KW - Explainable artificial intelligence
KW - Generative adversarial networks
KW - Multi-objective optimization
UR - http://www.scopus.com/inward/record.url?scp=85177595041&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.119898
DO - 10.1016/j.ins.2023.119898
M3 - Article
AN - SCOPUS:85177595041
SN - 0020-0255
VL - 655
JO - Information Sciences
JF - Information Sciences
M1 - 119898
ER -