Skip to main navigation Skip to search Skip to main content

Metrics reloaded: recommendations for image analysis validation

  • Lena Maier-Hein*
  • , Annika Reinke*
  • , Patrick Godau
  • , Minu D. Tizabi
  • , Florian Buettner
  • , Evangelia Christodoulou
  • , Ben Glocker
  • , Fabian Isensee
  • , Jens Kleesiek
  • , Michal Kozubek
  • , Mauricio Reyes
  • , Michael A. Riegler
  • , Manuel Wiesenfarth
  • , A. Emre Kavur
  • , Carole H. Sudre
  • , Michael Baumgartner
  • , Matthias Eisenmann
  • , Doreen Heckmann-Nötzel
  • , Tim Rädsch
  • , Laura Acion
  • Michela Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew B. Blaschko, M. Jorge Cardoso, Veronika Cheplygina, Beth A. Cimini, Gary S. Collins, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken, Robert Haase, Daniel A. Hashimoto, Michael M. Hoffman, Merel Huisman, Pierre Jannin, Charles E. Kahn, Dagmar Kainmueller, Bernhard Kainz, Alexandros Karargyris, Alan Karthikesalingam, Florian Kofler, Annette Kopp-Schneider, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens, Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering, Bjoern Menze, Karel G.M. Moons, Henning Müller, Brennan Nichyporuk, Felix Nickel, Jens Petersen, Nasir Rajpoot, Nicola Rieke, Julio Saez-Rodriguez, Clara I. Sánchez, Shravya Shetty, Maarten van Smeden, Ronald M. Summers, Abdel A. Taha, Aleksei Tiulpin, Sotirios A. Tsaftaris, Ben Van Calster, Gaël Varoquaux, Paul F. Jäger*
*Corresponding author for this work
  • German Cancer Research Center
  • Heidelberg University 
  • Frankfurt Cancer Insititute
  • Imperial College London
  • University of Duisburg-Essen
  • Masaryk University
  • University of Bern
  • Simula Metropolitan Center for Digital Engineering
  • University of Tromsø – The Arctic University of Norway
  • University College London
  • King's College London
  • Universidad de Buenos Aires
  • McGill University
  • Indiana University Bloomington
  • University of Pennsylvania
  • Holon Institute of Technology
  • European Federation for Medical Informatics
  • KU Leuven
  • IT University of Copenhagen
  • Broad Institute
  • National Institutes of Health
  • Ciudad Autónoma de Buenos Aires
  • University of Adelaide
  • Fraunhofer Institute for Digital Medicine
  • Radboud University Nijmegen
  • Technische Universität Dresden
  • Center for Systems Biology Dresden
  • Leipzig University
  • Princess Margaret Cancer Centre
  • University of Toronto
  • Vector Institute
  • Ltsi - Umr 1099
  • Institut national de la santé et de la recherche médicale
  • Max Delbrück Center for Molecular Medicine in the Helmholtz Association
  • University of Potsdam
  • Friedrich-Alexander University Erlangen-Nürnberg
  • IHU Strasbourg
  • Alphabet Inc.
  • Helmholtz AI
  • European Molecular Biology Laboratory
  • Stony Brook University
  • Vanderbilt University
  • University Health Network
  • Sunnybrook Research Institute
  • University of New South Wales
  • University of Zurich
  • Utrecht University
  • University of Applied Sciences Western Switzerland
  • University of Geneva
  • MILA (Québec Artificial Intelligence Institute)
  • University of Hamburg
  • University of Warwick
  • NVIDIA
  • University of Amsterdam
  • Vienna University of Technology
  • University of Oulu
  • University of Edinburgh
  • Leiden University
  • Institut national de recherche en informatique et en automatique

Research output: Contribution to journalArticlepeer-review

318 Citations (Scopus)

Abstract

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Original languageEnglish
Pages (from-to)195-212
Number of pages18
JournalNature Methods
Volume21
Issue number2
DOIs
Publication statusPublished - Feb 2024
Externally publishedYes

Fingerprint

Dive into the research topics of 'Metrics reloaded: recommendations for image analysis validation'. Together they form a unique fingerprint.

Cite this