MLCommons

Training Working Group

HPC Working Group

Mission

Create MLPerf HPC training benchmarks based on science applications to run on large-scale supercomputers.

Purpose

The MLPerf HPC benchmark suite includes scientific applications that use ML, especially Deep Learning (DL) at HPC scale. These benchmarks will help project future system performance and assist in the design and specification of future HPC systems. The benchmark suite aims to evaluate behavior unique to HPC applications and improve our understanding across several dimensions. First, we explore model-system interactions. Second, we characterize and optimize deep learning workloads, and identify potential bottlenecks. Last, we quantify the scalability for different deep learning methods, frameworks and metrics on hardware diverse HPC systems.

Deliverables

  1. MLPerf HPC Training benchmarks with rules and definitions
  2. Reference implementations of the MLPerf HPC Training benchmarks
  3. Release roadmap for future versions
  4. Publish benchmark results annually during Supercomputing

Meeting Schedule

Weekly on Monday from 8:00-9:00AM Pacific.

Mailing List

hpc@mlcommons.org

Working Group Resources

Google Drive

Working Group Chair Emails

Murali Emani (memani@anl.gov)

Steve Farrell (sfarrell@lbl.gov)

Working Group Chair Bios

Murali Emani is a Computer Scientist in the Data Science group with the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory. His research interests include scalable machine learning, high performance computing, emerging HPC and AI architectures. Prior, he was a Postdoctoral Research Staff Member at the Lawrence Livermore National Laboratory, US. He obtained his PhD from University of Edinburgh, UK. He was recently awarded DoE ASCR grant to develop a framework ‘HPC-FAIR’ to manage datasets and AI Models for Analyzing and Optimizing Scientific Applications.

LinkedIn

Steven Farrell is a Machine Learning Engineer at the NERSC supercomputing center. He supports scientific deep learning workflows on HPC systems through software development, benchmarking, user support, and training. His research interests include applications of deep learning to high energy physics, generative modeling, and applications of learning on structured data such as graphs. He was a member of the ATLAS experiment at CERN for many years, first during his Ph.D studies at UC Irvine working on searches for electroweak supersymmetry, and then as a postdoc at Berkeley Lab working on software development and machine learning applications for analysis and simulation.

LinkedIn