Research Working Group

Medical Working Group


Design benchmarks and propose best practices to accelerate the development of AI and machine learning for healthcare, catalyze new markets, and ultimately improve Patient outcome and improve Providers’ experience.


Machine learning (ML) has tremendous potential to improve medical treatment, but is held back by the complexities of obtaining reliable access to data, by the fact that EHR systems by default are not designed to readily integrate AI in their work-flows, by limited and inconsistent evaluation of ML models leading to a gap between research results and translation/clinical efficacy, by FDA/EMA/other governance body for medical AI, and by the requirement to educate both patients and providers on AI capabilities and limitations. Access to medical data is filtered through a number of factors, including data safety provisions (eg, HIPAA, GDPR, etc.), the fragmented nature of health care, the cost and difficulty of accurately labeling data, and the separation of medical organizations that have the data from other organizations that would like to independently analyze or leverage it. Beyond data access, promising research ML models may often disappoint in clinical practice due in part to inconsistent and insufficient model maturity or because they did not account for how the clinical work-flows function, or for a number of other factors (eg, data readiness, governance, etc.). This lack of standardization for evaluation makes it difficult to compare models, both to drive new research and to guide choice of models for use in products or services. Recently, the FDA has started to look at how to regulate AI products that would act autonomously (ie, without a provider review) or would have a large impact on patients (and would still be reviewed by a care team). While the FDA is looking more at the “process certification” side of things, data access remains a real factor slowing down development of potential innovations. Federated Learning (FL), in which ML models are trained on multiple datasets without sharing the data, has been recently shown to have the potential to unite fragmented data while still preserving data privacy and significantly increase model maturity. As an example about the FL approach, in the context of multiple hospitals trying to collectively build a better model, each hospital’s data never leaves the hospital premises, only ML “kicks” are shared among models, allowing therefore ML efforts in a privacy-preserving manner.

The MLPerf™ Medical Accuracy working group aims to unlock ML’s potential to positively impact healthcare by developing standardized metrics and best practices for models and datasets to help accelerate medical machine learning (ML) technology by improving data access and model evaluation in Federated Learning. We believe that these efforts will give key stakeholders the confidence to trust models and the data/processes they relied on, therefore accelerating ML adoption in the clinical settings, possibly improving patient outcomes, optimizing healthcare costs, and improving provider experiences.


  1. Initial efforts will focus on developing a Proof-of-Concept (PoC) using publicly available medical data to help:

    • Build confidence for discussions of participation with real clinical data
    • Validate our technological approach
    • Enhance our understanding of system requirements
    • Serve as a public example of federated learning in the medical space
    • Demonstrate model evaluation and data utility under different scenarios (e.g. unlabeled multi-sourced data, labelled multi-sourced data, fixed model, fine-tuned model)
    • Publish white paper with findings
  2. Long Term efforts will focus on:

    • Extending initial PoC features with benchmarks for methods in data utility, model evaluation, data privacy
    • Improving technical integration with Federated Learning frameworks
    • Building partnerships with multiple medical organizations
    • Facilitate clinical data access

Meeting Schedule

Weekly on Monday from 9:00-10:00AM Pacific.

Mailing List

Working Group Resources

Google Drive

Working Group Chair Emails

Alexandros Karargyris (

Renato Umeton (

Micah Sheller (, Vice Chair

Working Group Chair Bios

Alexandros Karargyris is a senior researcher at the Institute of Image-Guided Surgery of Strasbourg, a unique place for translative clinical research. He is leading projects related to applications in the intersection of surgery and artificial intelligence (AI). Previously, he worked as a researcher at IBM and NIH for more than 10 years. His research interests lie in the space of medical imaging, machine learning and mobile health. He has contributed to healthcare commercial products and imaging solutions deployed in under-resourced areas. His work has been published in peer-reviewed journals and conferences.


Renato Umeton heads artificial intelligence (AI) and data science in the Informatics & Analytics department of Dana-Farber Cancer Institute, a teaching-affiliate of Harvard Medical School. His work focuses on Operations, translating AI from the research realm into the Clinic and into Enterprise software offerings that benefit patients and cancer researchers, at scale. Renato started working on artificial intelligence, data science and big data in 2007, when these areas were yet to be well defined; since then he has published in several scientific fields, he has worked in academia, in consulting, in hospital settings, in biotech, and is currently affiliated also with Harvard School of Public Health, Massachusetts Institute of Technology, and Weill Cornell Medicine.


Micah Sheller currently works as a senior research scientist in Intel's Security and Privacy Research Labs, where he leads secure federated learning research. He developed the first version of the OpenFL open-source federated learning platform and is the technical federated learning lead for the FeTS initiative, which recently trained a 3DResUNet across 53 hospitals. Micah has had the pleasure of working on a wide range of projects since his first Intel internship in 1999, when he worked on the Intel Web Tablet. Work in USB 3.0, the prototype Intel SGX software runtime, passive and continuous biometrics and more has kept Micah happily learning and making friends throughout his career.

Intel Bio