Safety Resources - MLCommons

Resources

MLCommons is an open, transparent organization. This resources page provides supporting documentation for the AILuminate benchmark suite and will be continually updated.

AILuminate v1.0 benchmark supporting resources

MLCommons AILuminate Benchmark Assessment Standard

MLCommons AILuminate Technical Paper
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

UL Research Institutes Reliability Analysis

Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.

MLCommons AILuminate Benchmarking Policy

MLCommons AILuminate Messaging Guidelines

AILuminate v1.0 benchmark launch event talks

Lightning Talks

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/NGLi3r-HLDM?si=D5nJws-P0Uh-Kuyfu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Eleanora Presani, Meta

AILuminate v1.0 Benchmark Launch

Assessment Standard

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/f9IzafK1Rxc?si=FO6K1Zi9rDJTTUCQu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Heather Frase, Veritech

AILuminate v1.0 Benchmark Launch

Prompts and Infrastructure

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/1OF_w3OPOu0?si=vpr_d1csP-xAcHNsu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Shaona Ghosh, NVIDIA

AILuminate v1.0 Benchmark Launch

Evaluator Mechanism

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/0T0d2a5zjM4?si=OKDtvMPgtHcxtv1mu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Marisa Boston, Reins AI

AILuminate v1.0 Benchmark Launch

Use Cases

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/ladWDtqmAUY?si=DSi6TsWIBJEVTSFGu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Sean McGregor, UL Research Institutes

AILuminate v1.0 Benchmark Launch

Integrity

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/pluecqvdmWo?si=VQbaOHzpge8suGhNu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

All

AILuminate v1.0 Benchmark Launch

Lightning Talks Q and A

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/4gfJAt6nyes?si=IJtzc95wjLAAJN86u0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

All

AILuminate v1.0 Benchmark Launch

AILuminate v1.0 Benchmark Launch Event – In Full

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/cFHL7PsMTYo?si=vDk6HD6UMktb9HePu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Peter Mattson, MLCommons President

AILuminate v1.0 Benchmark Launch

Overview of AILuminate v1.0 Benchmark

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/Qm4YLl0fj5I?si=Ohv3ynOrENNA-XkAu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

All

AILuminate v1.0 Benchmark Launch

AILuminate Panel Discussion

AILuminate v0.5 proof-of-concept resources

v0.5 POC AILuminate Benchmark Technical Whitepaper

v0.5 POC Technical Glossary

v0.5 POC Taxonomy of Hazards

V0.5 POC Test Specification Schema

Explore the AILuminate Benchmark

Join the Working Group