Several members of the MLCommons® Medical working group have co-authored a chapter for a new book on AI in medical imaging. The chapter, “Collaborative evaluation for performance assessment of medical imaging applications” appears in the book Trustworthy AI in Medical Imaging, part of a book series produced by the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) and Elsevier. It provides an introduction to the concept of collaborative evaluation: how healthcare stakeholders form a community to jointly evaluate AI medical imaging systems.

Collaborative Evaluation: organizing to overcome barriers to evaluating medical AI applications

AI has been used in medical imaging for a while, but its adoption in clinical settings has been slow due to a lack of thorough and robust evaluation enabling trust by healthcare stakeholders. The rise of “generative AI”, or simply “GenAI”, in this field makes comprehensive evaluation even more crucial. This requires a large, diverse set of test data, which no single organization usually has. Therefore, organizations need to pool their data, but privacy and security regulations can make this difficult. A collaborative structure can help overcome these barriers.

Such a collaborative structure can organize essential evaluation activities like defining data specifications, preparing data, conducting evaluations, specifying evaluation metrics and publishing results. All these steps should happen with a clear governance structure, high level of integrity and transparency of evaluation enforced by open technology to enable stakeholders’ trust and collaboration. Teams can decide how to automate and whether to centralize or distribute these tasks. For example, a method called federated evaluation can be used to avoid data sharing, thus empowering data ownership, at the cost of more automation to handle the logistics.

The book chapter lays out a typical workflow for a collaborative evaluation process, and includes a case study of its implementation. It also delves into some of the key considerations in designing and managing collaborative evaluation, including:

  • Orchestration and scalability: how to design, provision and coordinate the evaluation process efficiently, across multiple sites;
  • Security and data/model protection: validating that the key resources are secure, free from tampering, and compliant with regulatory requirements for medical data;
  • Integrity and transparency: ensuring that the evaluation process is secure, confidential, and free from tampering for accountability and traceability purposes
  • Data quality: avoiding “garbage in, garbage out” issues by ensuring that the data used to evaluate AI imaging systems is of consistent, high quality.

The chapter closes by touching on some of the challenges and opportunities that lay ahead for collaborative evaluation. It discusses issues around sustainability: given the high cost for acquiring and preparing both data resources and AI technology systems, it is critical to ensure that stakeholders have sufficient financial incentives to continue participating. It also speaks to the need to “democratize” collaborative evaluation so that smaller industry stakeholders can join, contribute and benefit; and the need to ensure that collaborative evaluation aligns with clinical workflow processes.

Order the book today

More information about the book Trustworthy AI in Medical Imaging, published by Elsevier, can be found here. Our chapter “Collaborative evaluation for performance assessment of medical imaging applications” is available to download here.