MLCommons

Training Working Group

HPC Working Group

Mission

Create MLPerf™ HPC training benchmarks based on science applications to run on large-scale supercomputers.

Purpose

The MLPerf HPC benchmark suite includes scientific applications that use ML, especially Deep Learning (DL) at HPC scale. These benchmarks will help project future system performance and assist in the design and specification of future HPC systems. The benchmark suite aims to evaluate behavior unique to HPC applications and improve our understanding across several dimensions. First, we explore model-system interactions. Second, we characterize and optimize deep learning workloads, and identify potential bottlenecks. Last, we quantify the scalability for different deep learning methods, frameworks and metrics on hardware diverse HPC systems.

Deliverables

  1. MLPerf HPC Training benchmarks with rules and definitions
  2. Reference implementations of the MLPerf HPC Training benchmarks
  3. Release roadmap for future versions
  4. Publish benchmark results annually during Supercomputing

Meeting Schedule

Weekly alternating between Monday at 8:05-9:00AM Pacific and Monday at 3:05-4:00PM Pacific.

How to Join

Use this link to request to join the group/mailing list, and receive the meeting invite:
HPC Google Group.
Requests are manually reviewed, so please be patient.

Working Group Resources

Working Group Chairs

Murali Emani (memani@mlcommons.org) - LinkedIn

Murali Emani is a Computer Scientist in the Data Science group with the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory. His research interests include scalable machine learning, high performance computing, emerging HPC and AI architectures. Prior, he was a Postdoctoral Research Staff Member at the Lawrence Livermore National Laboratory, US. He obtained his PhD from University of Edinburgh, UK. He was recently awarded DoE ASCR grant to develop a framework ‘HPC-FAIR’ to manage datasets and AI Models for Analyzing and Optimizing Scientific Applications.

Andreas Prodromou (aprodromou@mlcommons.org) - LinkedIn

Dr. Andreas Prodromou is a Senior Deep Learning Architect at NVIDIA, where he specializes in analyzing the requirements of state-of-the-art AI models, frameworks, and hardware accelerators. He holds a Ph.D. in Computer Science from UC San Diego, with a focus on predicting hardware events in real-time using deep learning. In addition to his industry experience, Andreas serves as a reviewer for esteemed conferences such as ISCA, MICRO, and ASPLOS, and has contributed to MLPerf HPC as his company's representative for over two years. Beyond his professional pursuits, he is a Second Lieutenant reserve officer for the Greek Cypriot National Guard.