MLPerf Inference v0.5 results

After introducing the first industry-standard inference benchmarks in June of 2019, today the MLPerf™ consortium released 595 inference benchmark results from 14 organizations. These benchmarks measure how quickly a trained neural network can process new data for a wide range of applications (autonomous driving, natural language processing, and many more) on a variety of form factors (IoT devices, smartphones, PCs, servers and a variety of cloud solutions). The results of the benchmarks are available on the MLPerf website at mlcommons.org/en/inference-datacenter-05/.

“All released results have been validated by the audits we conducted,” stated Guenther Schmuelling, MLPerf Inference Results Chair from Microsoft. “We were very impressed with the quality of the results. This is an amazing number of submissions in such a short time since we released these benchmarks this summer. It shows that inference is a growing and important application area, and we expect many more submissions in the months ahead.”

“Companies are embracing these benchmark tests to provide their customers with an objective way to measure and compare the performance of their machine learning solutions,” stated Carole-Jean Wu, Inference Co-chair from Facebook. “There are many cost-performance tradeoffs involved in inference applications. These results will be invaluable for companies evaluating different solutions.”

Of the 595 benchmark results released today, 166 are in the Closed Division intended for direct comparison of systems. The results span 30 different systems. The benchmarks show a 4-order-of-magnitude difference in performance and a 3-order-of-magnitude range in estimated power consumption and range from embedded devices and smartphones to large-scale data center systems. The remaining 429 open results are in the Open Division and show a more diverse range of models, including low precision implementations and alternative models.

Companies in China, Israel, Korea, the United Kingdom, and the United States submitted benchmark results. These companies include: Alibaba, Centaur Technology, Dell EMC, dividiti, FuriosaAI, Google, Habana Labs, Hailo, Inspur, Intel, NVIDIA, Polytechnic University of Milan, Qualcomm Technologies, and Tencent.

“As an all-volunteer open-source organization, we want to encourage participation from anyone developing an inference product, even in the research and development stage,” stated Christine Cheng, Inference Co-chair. “You are welcome to join our forum, join working groups, attend meetings, and raise any issues you find.”

According to David Kanter, Inference and Power Measurement Co-chair, “We are very excited about our roadmap, future versions of MLPerf will include additional benchmarks such as speech-to-text and recommendation, and additional metrics such as power consumption.”

“MLPerf is also developing a smartphone app that runs inference benchmarks for use with future versions. We are actively soliciting help from all our members and the broader community to make MLPerf better,” stated Vijay Janapa Reddi, Associate Professor, Harvard University, and MLPerf Inference Co-chair.

Additional information about these benchmarks are available at mlcommons.org/en/inference-datacenter-05/. The MLPerf Inference Benchmark whitepaper is available at https://arxiv.org/abs/1911.02549. The MLPerf Training Benchmark whitepaper is available at https://arxiv.org/abs/1910.01500.