Research Working Group

Science Working Group


Evaluate, Organize, Curate, and Integrate artifacts around Applications, Models(algorithms), Infrastructure, and the 3 MLCommons Pillars Benchmarks, Datasets, and Best Practices. These artifacts are open source and accessible through the MLCommons GitHub. Our input comes from independently funded a ctivities and experts in Industry, Government, and Research.

This includes the folowing activities.

Encourage and support the curation of large-scale experimental and scientific datasets and the engineering of ML benchmarks operating on those datasets. The WG will engage with scientists, academics, and national laboratories, such as synchrotrons, in securing, engineering, curating, and publishing datasets and machine learning benchmarks that operate on experimental scientific datasets. This will entail working across different domains of sciences, including material, life, environmental, and earth sciences, particle physics, and astronomy, to mention a few. We will include traditional observational and computer-generated data.

Although scientific data is widespread, curating, maintaining, and distributing large-scale, useful datasets for public consumption is a challenging process, covering various aspects of data (from FAIR principles to distribution to versioning). With large data products, various ML techniques have to be evaluated against different architectures and different datasets. Without these benchmarking efforts, the community has no clear pathway for utilizing these advanced models. We expect that the collection will have significant tutorial value as examples from one field, and one observational or computational experiment can be modified to advance other fields and experiments.

The working group’s goal is to assemble and distribute scientific data sets relevant to a scientific campaign in a systematic manner, and pose quantifiable targets (“science benchmark"). A benchmark involves (i) a data set, (ii) objective criteria to meet, and (iii) an example implementation. The objective criteria depends on the scientific problem at hand. The metric should be well defined on the data but could come from a diverse set of measures (one or more of: accuracy targets, top-1 or 5% error, time to convergence, cross-validation rates, confusion matrices, type-1/type-2 error rates, inference times, surrogate accuracy, control stability measure, etc.). Although we compile system performance numbers across a variety of architectures, our goal is not performance measurements but rather improving scientific discovery performance.


  • Develop a number of science benchmarks.
  • Allow for open category benchmarks.
  • Focus on the scientific improvement.

Meeting Schedule

  • Bi-weekly on Wednesday from 8:00-9:00 AM PST, which is 11-12 AM EST.

How to Join

Use this link to request to join the group/mailing list, and receive the meeting invite:
Science Google Group.
Requests are manually reviewed, so please be patient.

Working Group Resources

Working Group Chairs

  • Geoffrey Fox ( (CV)
  • Tony Hey ( (CV)
  • Jeyan Thiyagalingam ( (CV)