AILuminate
The MLCommons AILuminate benchmark assesses the safety of general chatbot gen AI systems to help guide development, inform purchasers and consumers, and support standards bodies and policymakers.
Delivering open, useful measures of quality, performance and safety to help guide responsible AI development.
The foundation for MLCommons benchmark work was derived from and builds upon MLPerf which aims to deliver a representative benchmark suite for AI/ML that fairly evaluates system performance to meet five high-level goals:
Enable fair comparison of competing systems while still encouraging AI innovation.
Accelerate AI progress through fair and useful measurement.
Enforce reproducibility to ensure reliable results.
Serve both the commercial and research communities.
Keep benchmarking effort affordable so all can participate.
Each benchmark suite is defined by a working group community of experts, who establish the fair benchmarks for AI systems. The working group defines the AI model to run, the data set against which it gets run, sets rules on what changes to the model are allowed, and measures how fast a given hardware runs the model. By working within this AI model tripod, MLCommons AI systems benchmarks measure not only the speed of hardware, but also the quality of training data, and quality metrics of an AI model itself.
The MLCommons AILuminate benchmark assesses the safety of general chatbot gen AI systems to help guide development, inform purchasers and consumers, and support standards bodies and policymakers.
Measures how much faster we can train neural network models to a given target performance by changing the underlying training algorithm.
Evaluates the performance of large language models (LLMs) and other AI workloads on personal computersโfrom laptops and desktops to workstations.
Measures how fast systems can process inputs and produce results using a trained model.
Measures how fast systems can process inputs and produce results using a trained model on edge systems.
Measures how fast consumer mobile devices with different AI chips and software stacks can process inputs and produce results using a trained model.
Measures how fast systems can process inputs and produce results using a trained model on ultra-low-power systems.
Measures how fast storage systems can supply training data when a model is being trained.
Measures how fast systems can train models to a target quality metric.
Measures how fast systems can train models to a target quality metric running on large-scale supercomputers.
If you are interested in submitting MLPerf benchmark results, please join the appropriate working group. Registration deadlines are several weeks in advance of submission dates to ensure that all submitters are aware of benchmark requirements, and to ensure proper provision of all necessary resources.
Membership is required for most benchmark working groups (e.g., Training, Inference, Mobile). There are some public benchmark working groups that have no access requirements where non-members may submit to the benchmark by first signing a Non-member Test Agreement.
We encourage people to become MLCommons Members if they wish to contribute to MLCommons projects. However, if you are interested in contributing to one of our open source projects and do not think your organization would be a good fit as a Member, please enter your GitHub ID into our subscription form. If your organization is already a Member of MLCommons, you can also use the subscription form to request authorization to commit code in accordance with the CLA.
MLPerf is a registered trademark of MLCommons. The use of the MLPerf results and trademark are described in our Policies.
Both member and non-member trademark agreements are available upon request by contacting [email protected].