MLCommons is an open, transparent organization. This resources page provides supporting documentation for the AILuminate benchmark suite and will be continually updated.

AILuminate v1.0 benchmark supporting resources

Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.

AILuminate v1.0 benchmark launch event talks

Lightning Talks

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/NGLi3r-HLDM?si=D5nJws-P0Uh-Kuyfu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Eleanora Presani, Meta

AILuminate v1.0 Benchmark Launch

Assessment Standard

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/f9IzafK1Rxc?si=FO6K1Zi9rDJTTUCQu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Heather Frase, Veritech

AILuminate v1.0 Benchmark Launch

Prompts and Infrastructure

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/1OF_w3OPOu0?si=vpr_d1csP-xAcHNsu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Shaona Ghosh, NVIDIA

AILuminate v1.0 Benchmark Launch

Evaluator Mechanism

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/0T0d2a5zjM4?si=OKDtvMPgtHcxtv1mu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Marisa Boston, Reins AI

AILuminate v1.0 Benchmark Launch

Use Cases

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/ladWDtqmAUY?si=DSi6TsWIBJEVTSFGu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Sean McGregor, UL Research Institutes

AILuminate v1.0 Benchmark Launch

Integrity

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/pluecqvdmWo?si=VQbaOHzpge8suGhNu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

All

AILuminate v1.0 Benchmark Launch

Lightning Talks Q and A

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/4gfJAt6nyes?si=IJtzc95wjLAAJN86u0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

All

AILuminate v1.0 Benchmark Launch

AILuminate v1.0 Benchmark Launch Event – In Full

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/cFHL7PsMTYo?si=vDk6HD6UMktb9HePu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

Peter Mattson, MLCommons President

AILuminate v1.0 Benchmark Launch

Overview of AILuminate v1.0 Benchmark

u003ciframe width=u0022560u0022 height=u0022315u0022 src=u0022https://www.youtube.com/embed/Qm4YLl0fj5I?si=Ohv3ynOrENNA-XkAu0022 title=u0022YouTube video playeru0022 frameborder=u00220u0022 allow=u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-shareu0022 referrerpolicy=u0022strict-origin-when-cross-originu0022 allowfullscreenu003eu003c/iframeu003e

All

AILuminate v1.0 Benchmark Launch

AILuminate Panel Discussion

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.