AI Risk & Reliability

Support community development of AI risk and reliability tests and organize definition of research- and industry-standard AI safety benchmarks based on those tests.

Join the Working Group

Purpose

Our goal is for these benchmarks to guide responsible development, support consumer / purchase decision making, and enable technically sound and risk-based policy negotiation.

Deliverables

We are a community based effort and always welcome new members. There is no previous experience or education required to join as a volunteer. Specifically, the working group has the following four major tasks:

Tests: Curate a pool of safety tests from diverse sources, including facilitating the development of better tests and testing methodologies.
Benchmarks: Define benchmarks for specific AI use-cases, each of which uses a subset of the tests and summarizes the results in a way that enables decision making by non-experts.
Platform: Develop a community platform for safety testing of AI systems that supports registration of tests, definition of benchmarks, testing of AI systems, management of test results, and viewing of benchmark scores.
Governance: Define a set of principles and policies and initiate a broad multi-stakeholder process to ensure trustworthy decision making.

Meeting Schedule

Monday Weekly on Monday from 9:00-9:30AM Pacific.

AI Risk & Reliability Working Group Projects

MLCommons AILuminate Benchmark Overview

See the AILuminate Safety Benchmark

See the AILuminate Jailbreak Benchmark

Read the AILuminate Assessment Standard

How to Join and Access Resources

To sign up for the group mailing list and receive the meeting invite:

Fill out our subscription form and indicate that you’d like to join the AI Risk & Reliability Working Group.
Associate a Google account with your organizational email address.
Once your request to join the AI Risk & Reliability working group is approved, you’ll be able to access the AI Risk & Reliability folder in the Public Google Drive.

To access the GitHub repositories (public):

If you want to contribute code, please submit your GitHub username to our subscription form.
Visit the GitHub repositories:
- ModelBench
- ModelGauge

AI Risk & Reliability Working Group Workstreams and Leads

Agentic, workstream leads: Sean McGregor, Deepak Nathani, Lama Saouma
Multimodal, workstream leads: Ken Fricklas and Lora Aroyo
Security, workstream lead: James Goel
Scaling and Analytics, workstream lead: James Ezick

AI Risk & Reliability Working Group Chairs

To contact all AI Risk & Reliability working group chairs email [email protected].

Joaquin Vanschoren

[email protected]

Joaquin Vanschoren is an Associate Professor of Computer Science at the Eindhoven University of Technology. His research focuses on understanding machine learning algorithms and turning insights into progressively more automated and efficient AI systems. He founded and leads OpenML.org, initiated and chaired the NeurIPS Datasets and Benchmarks track, and has won the Dutch Data Prize, an Amazon Research Award, and an ECMLPKDD Best Demo award. He has given over 30 invited talks, was a tutorial speaker at NeurIPS 2018 and AAAI 2021, and has authored over 150 scientific papers, as well as reference books on Automated Machine Learning and Meta-learning. He is editor-in-chief of DMLR, action editor of JMLR, and moderator for ArXiv. He is a founding member of the European AI networks ELLIS and CLAIRE.

Peter Mattson

[email protected]

Peter Mattson is a Senior Staff Engineer at Google. He co-founded and is President of MLCommons®, and co-founded and was General Chair of the MLPerf consortium that preceded it. Previously, he founded the Programming Systems and Applications Group at NVIDIA Research, was VP of software infrastructure for Stream Processors Inc (SPI), and was a managing engineer at Reservoir Labs. His research focuses on understanding machine learning models and data through quantitative metrics and analysis. Peter holds a PhD and MS from Stanford University and a BS from the University of Washington.

Questions?

Reach out to us at [email protected]

AI Risk & Reliability

Purpose

Deliverables

Meeting Schedule

AI Risk & Reliability Working Group Projects

How to Join and Access Resources

Related Insights

MLCommons Unveils New Jailbreak Benchmark, Quantifying AI’s “Resilience Gap” to Adversarial Attacks

MLCommons Builds New Agentic Reliability Evaluation Standard in Collaboration with Industry Leaders

MLCommons Announces Expansion of Industry-Leading AILuminate Benchmark

MLCommons Releases French AILuminate Benchmark Demo Prompt Dataset to Github

AI Risk & Reliability Working Group Workstreams and Leads

AI Risk & Reliability Working Group Chairs

Joaquin Vanschoren

Peter Mattson

Questions?