Training Working Group
HPC Working Group
- Overview
- Training Working Group
- Inference Working Group
- Datasets Working Group
- Best Practices Working Group
- Research Working Group
Mission
Create MLPerf™ HPC training benchmarks based on science applications to run on large-scale supercomputers.
Purpose
The MLPerf HPC benchmark suite includes scientific applications that use ML, especially Deep Learning (DL) at HPC scale. These benchmarks will help project future system performance and assist in the design and specification of future HPC systems. The benchmark suite aims to evaluate behavior unique to HPC applications and improve our understanding across several dimensions. First, we explore model-system interactions. Second, we characterize and optimize deep learning workloads, and identify potential bottlenecks. Last, we quantify the scalability for different deep learning methods, frameworks and metrics on hardware diverse HPC systems.
Deliverables
- MLPerf HPC Training benchmarks with rules and definitions
- Reference implementations of the MLPerf HPC Training benchmarks
- Release roadmap for future versions
- Publish benchmark results annually during Supercomputing
Meeting Schedule
Weekly alternating between Monday at 8:05-9:00AM Pacific and Monday at 3:05-4:00PM Pacific.
How to Join
Use this link to request to join the group/mailing list, and receive the meeting invite:
HPC Google Group.
Requests are manually reviewed, so please be patient.
Working Group Resources
-
Shared documents and meeting minutes:
- Associate a Google account with your e-mail address.
- Ask to join our Public Google Group.
- Once approved, go to the HPC folder in our Public Google Drive.
-
GitHub (public)
- If you want to contribute code, please sign our CLA first.
- GitHub link.
Working Group Chairs
Murali Emani (memani@mlcommons.org) - LinkedIn
Murali Emani is a Computer Scientist in the Data Science group with the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory. His research interests include scalable machine learning, high performance computing, emerging HPC and AI architectures. Prior, he was a Postdoctoral Research Staff Member at the Lawrence Livermore National Laboratory, US. He obtained his PhD from University of Edinburgh, UK. He was recently awarded DoE ASCR grant to develop a framework ‘HPC-FAIR’ to manage datasets and AI Models for Analyzing and Optimizing Scientific Applications.
Andreas Prodromou (aprodromou@mlcommons.org) - LinkedIn
Dr. Andreas Prodromou is a Senior Deep Learning Architect at NVIDIA, where he specializes in analyzing the requirements of state-of-the-art AI models, frameworks, and hardware accelerators. He holds a Ph.D. in Computer Science from UC San Diego, with a focus on predicting hardware events in real-time using deep learning. In addition to his industry experience, Andreas serves as a reviewer for esteemed conferences such as ISCA, MICRO, and ASPLOS, and has contributed to MLPerf HPC as his company's representative for over two years. Beyond his professional pursuits, he is a Second Lieutenant reserve officer for the Greek Cypriot National Guard.