MLPerf Training v0.7 results

Today the MLPerf™ consortium released results for MLPerf Training v0.7, the third round of results from their machine learning training performance benchmark suite. MLPerf is a consortium of over 70 companies and researchers from leading universities, and the MLPerf benchmark suites are the industry standard for measuring machine learning performance.

The MLPerf benchmark shows substantial industry progress and growing diversity, including multiple new processors, accelerators, and software frameworks. Compared to the prior submission round, the fastest results on the five unchanged benchmarks improved by an average of 2.7x, showing substantial improvement in hardware, software, and system scale. This latest training round encompasses 138 results on a wide variety of systems from nine submitting organizations. The Closed division results all use the same model/optimizer(s), while Open division results may use more varied approaches; the results include commercially Available systems, upcoming Preview systems, and RDI systems under research, development, or being used internally. To see the results, go to mlcommons.org/en/training-normal-07/.

The MLPerf Training benchmark suite measures the time it takes to train one of eight machine learning models to a standard quality target in tasks including image classification, recommendation, translation, and playing Go.

This version of MLPerf includes two new benchmarks and one substantially revised benchmark as follows:

BERT: Bi-directional Encoder Representation from Transformers (BERT) trained with Wikipedia is a leading edge language model that is used extensively in natural language processing tasks. Given a text input, language models predict related words and are employed as a building block for translation, search, text understanding, answering questions, and generating text.
DLRM: Deep Learning Recommendation Model (DLRM) trained with Criteo AI Lab’s Terabyte Click-Through-Rate (CTR) dataset is representative of a wide variety of commercial applications that touch the lives of nearly every individual on the planet. Common examples include recommendation for online shopping, search results, and social media content ranking.
Mini-Go: Reinforcement learning similar to Mini-Go from v0.5 and v0.6, but uses a full-size 19×19 Go board, which is more reflective of research.

MLPerf is committed to providing benchmarks that reflect the needs of machine learning customers, and is pioneering customer advisory boards to steer future benchmark construction. DLRM is the first benchmark produced using this process. The benchmark was developed based on expertise from a board consisting of academics and industry researchers with extensive recommendation expertise. “The DLRM-Terabyte recommendation benchmark is representative of industry use cases and captures important characteristics of model architectures and user-item interactions in recommendation data sets,” stated Carole-Jean Wu, MLPerf Recommendation Benchmark Advisory Board Chair from Facebook AI. The terabyte-sized click logs of Criteo AI Lab’s Terabyte CTR dataset is the largest open recommendation dataset, containing click logs of four billion user and item interactions over 24 days. “We are very excited about the partnership with MLPerf to form this new Recommendation Benchmark,” stated Flavian Vasile, Principal Researcher from Criteo AI Lab.

Additional information about the Training v0.7 benchmarks will be available at mlcommons.org/en/training-normal-07/.