MLPerf Mobile v6.0: New GenAI Benchmarks for On-Device LLMs

MLCommons® today announced the release of MLPerf® Mobile v6.0, introducing new generative AI benchmark tests for running large language models (LLMs) on Android devices. These tests join a comprehensive suite of benchmarks built into the MLPerf Mobile app, including tests for image generation, object detection, super resolution, and more.

NEW ON-DEVICE LLM BENCHMARKS
MLPerf Mobile v6.0 adopts these models for its new LLM benchmarks:

Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
Llama 3.1 8B Instruct

The models are asked to process requests selected from the TinyMMLU and IFEval datasets to quantify the performance and accuracy of on-device AI inference.

The LLM tests can run devices with sufficient memory via the CPU–without tailored acceleration. Additionally, this release supports NPU-accelerated execution of the Llama 3.1 8B Instruct model on Qualcomm Snapdragon 8 Elite Gen 5 SoCs. The working group plans to expand LLM acceleration support to more devices and platforms in the future.

EXPANDED SOC SUPPORT AND BROAD AVAILABILITY
In keeping with the MLPerf Mobile working group’s commitment to integrating support for new devices rapidly, the v6.0 release adds support for devices based on the MediaTek Dimensity 9500 Series.

Support for the following chips is also updated in this release:

Qualcomm Snapdragon 8 Elite Gen 5
Samsung Exynos 2600

Of course, the app already supports NPU-accelerated execution on a host of mobile devices.

The MLPerf Mobile app is available via the Google Play store, the Apple App Store, and via the MLPerf Mobile GitHub repo. More details about device support are available at any of these distribution points. The GitHub repo also contains the full, open-source code for the MLPerf Mobile app, released under the permissive Apache 2.0 license.

ABOUT MLCOMMONS
MLCommons is an open engineering consortium with a mission to make machine learning better for everyone. The organization develops industry-leading benchmarks, datasets, and best practices spanning cloud, data center, edge, and client AI systems. Its MLPerf benchmark suite is widely recognized as the standard for measuring machine learning performance.