Research Working Group

Science Working Group


Evaluate, Organize, Curate, and Integrate artifacts around Applications, Models(algorithms), Infrastructure, and the 3 MLCommons Pillars Benchmarks, Datasets, and Best Practices. These artifacts are open source and accessible through the MLCommons GitHub. Our input comes from independently funded a activities and experts in Industry, Government, and Research.


Encourage and support the curation of large-scale experimental and scientific datasets and the engineering of ML benchmarks operating on those datasets. The WG will engage with scientists, academics, and national laboratories, such as synchrotrons, in securing, engineering, curating, and publishing datasets and machine learning benchmarks that operate on experimental scientific datasets. This will entail working across different domains of sciences, including material, life, environmental, and earth sciences, particle physics, and astronomy, to mention a few. We will include traditional observational and computer-generated data.

Although scientific data is widespread, curating, maintaining, and distributing large-scale, useful datasets for public consumption is a challenging process, covering various aspects of data (from FAIR principles to distribution to versioning). With large data products, various ML techniques have to be evaluated against different architectures and different datasets. Without these benchmarking efforts, the community has no clear pathway for utilizing these advanced models. We expect that the collection will have significant tutorial value as examples from one field, and one observational or computational experiment can be modified to advance other fields and experiments.

The working group’s goal is to assemble and distribute scientific data sets relevant to a scientific campaign in a systematic manner, and pose quantifiable targets (“science benchmark"). A benchmark involves (i) a data set, (ii) objective criteria to meet, and (iii) an example implementation. The objective criteria depends on the scientific problem at hand. The metric should be well defined on the data but could come from a diverse set of measures (one or more of: accuracy targets, top-1 or 5% error, time to convergence, cross-validation rates, confusion matrices, type-1/type-2 error rates, inference times, surrogate accuracy, control stability measure, etc.). Although we compile system performance numbers across a variety of architectures, our goal is not performance measurements but rather improving scientific discovery performance.


  1. Develop a number of science benchmarks.
  2. Allow for open category benchmarks.
  3. Focus on scientific improvement.

Meeting Schedule

  • Bi-weekly on Wednesday from 8:05-9:00AM Pacific.

How to Join

Use this link to request to join the group/mailing list, and receive the meeting invite:
Science Google Group.
Requests are manually reviewed, so please be patient.

Working Group Resources

Working Group Chairs

Geoffrey Fox ( (CV)

Fox received a Ph.D. in Theoretical Physics from Cambridge University, where he was Senior Wrangler. He is now a Professor in the Biocomplexity Institute & Initiative and Computer Science Department at the University of Virginia. He previously held positions at Caltech, Syracuse University, Florida State University, and Indiana University, after being a postdoc at the Institute for Advanced Study at Princeton, Lawrence Berkeley Laboratory, and Peterhouse College Cambridge. He has supervised the Ph.D. of 77 students. He has an h-index of 87 with over 42,000 citations. He received the High-Performance Parallel and Distributed Computing (HPDC) Achievement Award and the ACM - IEEE CS Ken Kennedy Award for Foundational contributions to parallel computing in 2019. He is a Fellow of APS (Physics) and ACM (Computing) and works on the interdisciplinary interface between computing and applications. His current focus is on algorithms and software systems needed for the AI Science revolution.

Jeyan Thiyagalingam ( (CV)

Thiyagalingam is head of the Scientific Machine Learning (SciML) Group in STFC at RAL Rutherford Appleton Laboratory UK. Previously, he was a faculty in the School of Electrical Engineering and Electronics and Computer Sciences at the University of Liverpool. Prior to that, he was based at MathWorks and at the University of Oxford, both as a post-doctoral researcher and later as a James Martin Fellow focusing on high performance computing and big data processing. He is also a member of the HiPEAC Network. He has a very strong background in high performance computing, scientific data processing algorithms, signal processing, and machine learning. He is a Turing Fellow, and his current engagements include positions as an IRIS Delivery Board Member (2020-) and a Member of the STFC e-Infrastructure Advisory Group (SEAG, November 2021-).

Juri Papay (

Juri Papay is a senior data scientist at STFC Rutherford Appleton Laboratory UK. He received a PhD in Computer Science from Warwick University. His current work focuses on benchmarking machine learning applications and investigating the performance of large-scale GPU systems. Previously, he worked as a research scientist at Southampton University on numerous EU funded projects, covering a wide range of topics such as HPC, security modeling, discrete event simulations, image generation, and semantic research.