Today, MLCommons, an open engineering consortium, released new results for MLPerf HPC v1.0, the organization's machine learning training performance benchmark suite for high-performance computing (HPC). The MLPerf HPC suite measures the time it takes to train emerging scientific machine learning models to standard quality targets. The latest round introduces a novel metric for aggregate machine learning training throughput for supercomputers, which is a realistic representation of HPC system usage. All the benchmarks in the suite use large scientific simulations to generate training data.

MLPerf HPC is a full system benchmark, testing machine learning models, software, and hardware. MLPerf is a fair and consistent way to track ML performance over time, encouraging competition and innovation to improve performance for the community. Compared to the last submission round, the best benchmark results improved by 4-7X, showing substantial improvement in hardware, software, and system scale.

Similar to MLPerf HPC v0.7 results, the submissions consist of two divisions: closed and open. Closed submissions use the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit modified models. Submissions are additionally classified by availability within each division, including systems commercially available, in preview, and RDI (research, development, and internal).

New Benchmark and Metric to Measure Supercomputer Capabilities

MLPerf HPC v1.0 is a significant update and includes a new benchmark as well as a new performance metric. The OpenCatalyst benchmark predicts the quantum mechanical properties of catalyst systems to discover and evaluate new catalyst materials for energy storage applications. This benchmark uses the OC20 dataset from the Open Catalyst Project, the largest and most diverse publicly available dataset of its kind, with the task of predicting energy and the per-atom forces. The reference model for OpenCatalyst is DimeNet++, a graph neural network (GNN) designed for atomic systems that can model the interactions between pairs of atoms as well as angular relations between triplets of atoms.

MLPerf HPC v1.0 also features a novel weak-scaling performance metric that is designed to measure the aggregate machine learning capabilities for leading supercomputers. Most large supercomputers run multiple jobs in parallel, for example training multiple ML models. The new benchmark trains multiple instances of a model across a supercomputer to capture the impact on shared resources such as the storage system and interconnect. The benchmark reports both the time-to-train for all the model instances and the aggregate throughput of an HPC system, i.e., number of models trained per minute. Using the new weak-scaling metric, the MLPerf HPC benchmarks can measure the ML capabilities for supercomputers of any size, from just a handful of nodes to the world’s largest systems.

MLPerf HPC v1.0 results further MLCommons’ goal to provide benchmarks and metrics that level the industry playing field through the comparison of ML systems, software, and solutions. The latest benchmark round received submissions from 8 leading supercomputing organizations and released over 30 results, including 8 using the new weak-scaling metric. Submissions this round included the following organizations: Argonne National Laboratory, the Swiss National Supercomputing Centre, Fujitsu and Japan’s Institute of Physical and Chemical Research (RIKEN), Helmholtz AI (a collaboration of the Jülich Supercomputing Centre at Forschungszentrum and the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology), Lawrence Berkeley National Laboratory, the National Center for Supercomputing Applications, NVIDIA, and the Texas Advanced Computing Center. In particular, MLCommons would like to congratulate new submitters Argonne National Laboratory, Helmholtz AI, and NVIDIA. To view the results, please visit

“We are excited by the advances in the MLPerf HPC suite and community,” said Steven Farrell, Co-Chair of the MLPerf HPC Working Group. “It’s fantastic to measure such a significant improvement in performance, and we are particularly happy to see a new benchmark and the success of new submitting teams.”

“These benchmarks are aimed at measuring the full capabilities of modern supercomputers,” said Murali Emani, Co-Chair of the MLPerf HPC Working Group. "This iteration of MLPerf HPC will help guide upcoming Exascale systems for emerging machine learning workloads such as AI for science applications."

Additional information about the HPC v1.0 benchmarks will be available at

About MLCommons

MLCommons is an open engineering consortium with a mission to accelerate machine learning innovation, raise all boats and increase its positive impact on society. The foundation for MLCommons began with the MLPerf benchmark in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 50+ founding partners - global technology providers, academics and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets and best practices.

For additional information on MLCommons and details on becoming a Member or Affiliate of the organization, please visit or contact

Press Contact:
Liz Bazini, Bazini Hopp