Announcing the results of the inaugural AlgoPerf: Training Algorithms benchmark competition

We are thrilled to announce the results and winners of the first MLCommons® AlgoPerf: Training Algorithms benchmark competition, a competition designed to find better training algorithms that speed up neural network training across a diverse set of workloads.

The AlgoPerf: Training Algorithms Competition

To make building useful neural network models less time-consuming and costly, we need better training algorithms. The MLCommons Algorithms working group has developed the open-source AlgoPerf: Training Algorithms benchmark to measure how much faster neural networks can be trained through advancements in underlying training algorithms, such as better optimizers or more effective hyperparameter choices.

The AlgoPerf: Training Algorithms benchmark evaluates the time training required for different training algorithms across multiple realistic deep learning workloads when running on a fixed hardware configuration. To encourage generally useful methods, submissions must fully specify any required workload-specific tuning. Participants could choose to submit under two separate tuning rulesets: the external tuning ruleset, designed to simulate tuning with a limited amount of parallel resources, or the self-tuning ruleset, designed to simulate fully automated tuning on a single machine.

Participation

The first iteration of the AlgoPerf: Training Algorithms competition attracted 18 submissions (with 15 being scorable) from 10 different teams. Scoring involved over 4000 individual training runs across the 14 workloads used in the benchmark. Participants included researchers from Concordia University, ELLIS Tübingen, Google, Max Planck Institute for Intelligent Systems, Meta AI, Meta Platforms, Michigan State University, Mila, Samsung AI, UCLA, UT Austin, the University of Cambridge, the University of West Indies, and the Vector Institute.

The submissions collectively explored many interesting techniques and implementation choices, including submissions using both of our supported frameworks, JAX and PyTorch. As required by the rules, all submissions are released publicly under an Apache 2.0 open-source license.

The Winners & Results

Congratulations to Aaron Defazio (Meta), Alice Yang (Meta), and Konstantin Mishchenko (Samsung AI) who came in first place in the self-tuning ruleset with their “Schedule Free AdamW” submission (see Table 2). For the external tuning ruleset (see Table 1, below), first place goes to the “Distributed Shampoo” submission of Hao-Jun Michael Shi, Tsung-Hsien Lee, Anna Cai, Shintaro Iwasaki, Wenyin Fu, Yuchen Hao, and Mike Rabbat (all Meta).

The external tuning ruleset saw four submissions beating the challenging prize-qualification baseline, improving over the state-of-the-art training algorithm. The “Distributed Shampoo” submission provides an impressive 28% faster model training compared to the baseline. “Schedule Free AdamW” was the only submission in the self-tuning ruleset that beat the prize-qualification baseline, improving over it by providing an 8% faster neural network training process.

Congratulations to the winners and all participants for their contributions to advancing neural network training algorithms!

Score	Submission	Submitters	Institutions	Framework
0.78	Shampoo Submission	Hao-Jun Shi, Tsung-Hsien Lee, Anna Cai, Shintaro Iwasaki, Wenyin Fu, Yuchen Hao, Mike Rabbat	Meta Platforms	PyTorch
0.71	Schedule Free AdamW	Aaron Defazio, Alice Yang, Konstantin Mishchenko	Meta AI, Samsung AI	PyTorch
0.64	Generalized Adam	George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg	Google DeepMind	JAX
0.63	Cyclic LR	Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping	MPI-IS, ELLIS Tübingen	PyTorch
0.59	NadamP	George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg	Google DeepMind	JAX
0.57	Prize Qualification Baseline
0.49	Amos	Ran Tian	Google DeepMind	JAX
0.47	Caspr Adaptive	Sai Surya Duvvuri, Inderjit Dhillon, Cho-Jui Hsieh	UT Austin, Google, UCLA	JAX
0.37	Lawa Queue	Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping	MPI-IS, ELLIS Tübingen	PyTorch
0.34	Lawa EMA	Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping	MMPI-IS, ELLIS Tübingen	PyTorch
0.00	Schedule Free Prodigy	Aaron Defazio, Alice Yang, Konstantin Mishchenko	Meta AI, Samsung AI	PyTorch

Table 1: The external tuning leaderboard. All submissions to the external tuning ruleset, ranked by their benchmark score (first column). Benchmark scores are rounded to two significant digits, with higher scores indicating faster training. Note, the benchmark scores cannot be compared between the two rulesets.

Score	Submission	Submitters	Institutions	Framework
0.85	Schedule Free AdamW	Aaron Defazio, Alice Yang, Konstantin Mishchenko	Meta AI, Samsung AI	PyTorch
0.82	Prize Qualification Baseline
0.33	NadamW Sequential	George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg	Google DeepMind	JAX
0.14	sinv6_75	Abhinav Moudgil	Mila, Concordia University	JAX
0.09	sinv6	Abhinav Moudgil	Mila, Concordia University	JAX
0.00	AdamG	Yijiang Pang	Michigan State University	PyTorch

Table 2: The self-tuning leaderboard. All submissions to the self-tuning ruleset, ranked by their benchmark score (first column). Benchmark scores are rounded to two significant digits, with higher scores indicating faster training. Note, the benchmark scores cannot be compared between the two rulesets.

To receive a cash prize, the competition rules require that at least one other submission outperforms the prize qualification baseline for the relevant ruleset, AND that none of the authors of this competing submission share an affiliation with either of the two MLCommons Algorithms working group chairs. This condition was met for the external tuning ruleset, and thus a cash prize of $25,000 will be awarded by MLCommons for the first place submission. Despite the outstanding performance of the first place submissions in the self-tuning ruleset, the prize requirement was not met, since several competing submissions involved overlapping affiliations with the working group chairs, and the prize qualification baselines were quite difficult to beat. The working group’s goal in designing the AlgoPerf: Training Algorithms benchmark competition was to, first and foremost, make sure that any submission that performed well under our rules had to achieve something truly impressive, and we are delighted that the first place submissions in both rulesets managed to produce such exceptional results.

To view the full results of the AlgoPerf: Training Algorithms competition, including the workload-specific performances of each submission, please visit the AlgoPerf results page. We plan to release a paper with a more in-depth discussion of the results after we are done analyzing them in detail.

The next steps for AlgoPerf

The first iteration of AlgoPerf: Training Algorithms demonstrated that neural network training can be accelerated significantly by improving the underlying training algorithms. This iteration was only the first step in driving innovation in machine learning algorithms. Now that we can reliably measure progress in training algorithms, we anticipate rapid progress in this field, both in terms of new research and better methods. The working group is already hard at work planning for the future of the benchmark. If you are interested in shaping this future, developing and scoring any particular submissions, or collaborating on research that builds on top of our benchmark, please consider joining the working group.

Acknowledgments

We extend our sincere thanks to Google for their generous support in providing computational resources to score and evaluate all submissions across the workloads. Our gratitude also goes to the entire MLCommons organization for supporting the Algorithms working group and funding the $50,000 prize pool. Special thanks are due to the members of the Algorithms Working Group who developed, implemented, and managed the benchmark competition. We particularly want to thank Priya Kasimbeg, the Engineering Lead of the working group, who led the scoring process.

About MLCommons and the Algorithms Working Group

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data.

The AlgoPerf: Training Algorithms benchmark was developed by the MLCommons Algorithms Working Group. Researchers from a variety of academic institutions and industry labs serve on the working group. The group’s mission is to create a set of rigorous and relevant benchmarks to measure neural network training speedups due to algorithmic improvements. For additional information on the Algorithms Working Group, and details on how to become a member or contribute to the benchmarks, please visit the working group website or reach out to [email protected].