Research Working Group

Medical Working Group


Design benchmarks and propose best practices to accelerate the development of AI and machine learning for healthcare, catalyze new markets, and ultimately improve Patient outcome and improve Providers’ experience.


Machine learning (ML) has tremendous potential to improve medical treatment, but is held back by the complexities of obtaining reliable access to data, by the fact that EHR systems by default are not designed to readily integrate AI in their work-flows, by limited and inconsistent evaluation of ML models leading to a gap between research results and translation/clinical efficacy, by FDA/EMA/other governance body for medical AI, and by the requirement to educate both patients and providers on AI capabilities and limitations. Access to medical data is filtered through a number of factors, including data safety provisions (eg, HIPAA, GDPR, etc.), the fragmented nature of health care, the cost and difficulty of accurately labeling data, and the separation of medical organizations that have the data from other organizations that would like to independently analyze or leverage it. Beyond data access, promising research ML models may often disappoint in clinical practice due in part to inconsistent and insufficient model maturity or because they did not account for how the clinical work-flows function, or for a number of other factors (eg, data readiness, governance, etc.). This lack of standardization for evaluation makes it difficult to compare models, both to drive new research and to guide choice of models for use in products or services. Recently, the FDA has started to look at how to regulate AI products that would act autonomously (ie, without a provider review) or would have a large impact on patients (and would still be reviewed by a care team). While the FDA is looking more at the “process certification” side of things, data access remains a real factor slowing down development of potential innovations. Federated Learning (FL), in which ML models are trained on multiple datasets without sharing the data, has been recently shown to have the potential to unite fragmented data while still preserving data privacy and significantly increase model maturity. As an example about the FL approach, in the context of multiple hospitals trying to collectively build a better model, each hospital’s data never leaves the hospital premises, only ML “kicks” are shared among models, allowing therefore ML efforts in a privacy-preserving manner.

The MLPerf™ Medical Accuracy working group aims to unlock ML’s potential to positively impact healthcare by developing standardized metrics and best practices for models and datasets to help accelerate medical machine learning (ML) technology by improving data access and model evaluation in Federated Learning. We believe that these efforts will give key stakeholders the confidence to trust models and the data/processes they relied on, therefore accelerating ML adoption in the clinical settings, possibly improving patient outcomes, optimizing healthcare costs, and improving provider experiences.


  1. Initial efforts will focus on developing a Proof-of-Concept (PoC) using publicly available medical data to help:

    • Build confidence for discussions of participation with real clinical data
    • Validate our technological approach
    • Enhance our understanding of system requirements
    • Serve as a public example of federated learning in the medical space
    • Demonstrate model evaluation and data utility under different scenarios (e.g. unlabeled multi-sourced data, labelled multi-sourced data, fixed model, fine-tuned model)
    • Publish white paper with findings
  2. Long Term efforts will focus on:

    • Extending initial PoC features with benchmarks for methods in data utility, model evaluation, data privacy
    • Improving technical integration with Federated Learning frameworks
    • Building partnerships with multiple medical organizations
    • Facilitate clinical data access

Meeting Schedule

Weekly on Monday from 9:00-10:00AM Pacific.

Mailing List

Working Group Resources

Google Drive

Working Group Chair Emails

Alexandros Karargyris (

Renato Umeton (

Working Group Chair Bios

Alexandros Karargyris is currently working in his own startup company. Previously, he worked as a researcher at IBM and NIH for more than 10 years. His research interests lie in the space of medical imaging, mobile health and machine learning. He has contributed to healthcare commercial products and imaging solutions deployed in under-resourced areas. His work has been published in peer-reviewed journals and conferences.


Renato Umeton heads a data science team in the Informatics & Analytics department of Dana-Farber Cancer Institute, a teaching-affiliate of Harvard Medical School. His work focuses on Operations, translating AI from the research realm into the Clinic and into Enterprise software offerings that benefit patients and cancer researchers, at scale. Renato started working on big data and data science in 2007, when these areas were yet to be well defined; since then he has published in several scientific fields, he has worked in academia, in consulting, in hospital settings, in biotech, and is currently affiliated also with Harvard School of Public Health, Massachusetts Institute of Technology, and Cornell.