The artificial intelligence industry has reached an inflection point. As organizations move AI from experimental pilots into mission-critical operations across finance, healthcare, and manufacturing, one question has emerged as the central barrier to enterprise adoption: how do we verify that these systems are reliable?
Today, the MLCommons Association—backed by a coalition including KPMG, Google, Microsoft, and Qualcomm—is announcing the creation of the AILuminate Global Assurance Program (AIL GAP). This program represents a significant development: a commitment to build a structured, data-driven mechanism for evaluating AI reliability that bridges the persistent gap between high-level standards or policy frameworks and on-the-ground technical performance.
Why This Matters for Risk and Compliance
Unlike traditional software, AI models produce probabilistic outputs—results that vary based on data, context, and configuration. Existing standards, such as ISO/IEC 42001, provide essential procedural and governance-level requirements. Still, they do not specify the empirical metrics needed to demonstrate that a given model performs within acceptable risk thresholds. Put another way, how do we appropriately evidence adherence to those standards? The AILuminate Global Assurance Program aims to address that gap directly.
Three Pillars of Assurance
The program will be organized around three core pillars, each targeting a distinct need within the AI lifecycle.
Build: Benchmarking-as-a-Service (BaaS). AI developers will be able to integrate proven, private, non-saturated benchmarks directly into their pre-release workflows. The service will offer both practice testing to guide iterative model tuning and official testing to produce verified performance results. For compliance teams, this means that risk assessment will be tightly integrated into the AI development life cycle, both pre- and post-launch.
Show: The AILuminate Risk Label. The program aims to distill benchmark results into a clear risk label designed for decision-makers and non-specialists. This label should translate technical metrics into a format that supports corporate governance, procurement decisions, and alignment with higher-level standards—giving risk professionals a consistent, comparable indicator of model safety.
Scale: The AILuminate Global Framework. Recognizing that AI deployment is global, the program will include a technological framework for developing region- and language-specific benchmarks and for adapting to industry-specific needs. This will ensure that standards remain relevant and enforceable across jurisdictions, a critical consideration for organizations operating in multiple regulatory environments.
How Your Organization Can Participate
We are designing the AILuminate Global Assurance Program to be an open, evolving initiative. Its technical specifications and benchmarks will be iterative by design, intended to keep pace with the rapid advancement of AI capabilities. There are several concrete ways to engage:
- Risk and compliance professionals can contribute feedback and help shape standards as they mature by joining the Global Assurance Program. Please complete the AILuminate Global Assurance Program Interest Form, and we will be in touch with details on when and how to participate.
- Organizations evaluating or deploying AI systems can begin referencing AILuminate benchmarks and risk labels as part of their vendor assessment and due diligence processes.
- Development teams interested in integrating Benchmarking-as-a-Service into their model validation pipelines should contact [email protected] for more information.
- Organizations with regional or sector-specific expertise are encouraged to collaborate on extending the Global Framework to address local regulatory needs by completing the AIRR Contributor Interest form and checking the boxes under “multicultural” in the workstream interest section.
The program operates under the MLCommons Association, and information on joining or contributing is available at mlcommons.org.
The Path Forward
History has shown that industries mature when they adopt shared, transparent standards for safety and reliability. The AILuminate Global Assurance Program represents a deliberate step toward replicating that trajectory for artificial intelligence. For risk and compliance professionals, this is the moment to come off the sidelines and help actively define the standards that will govern AI accountability for years to come.