MLCommons is an open, transparent organization. This resources page provides AILuminate benchmark supporting documentation and will be continually updated.

AILuminate v1.0 benchmark supporting resources

Coming Soon! AILuminate Technical Paper

UL Research Institutes Reliability Analysis

Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.

AILuminate v1.0 benchmark launch event talks

Lightning Talks

Eleanora Presani, Meta

AILuminate v1.0 Benchmark Launch

Assessment Standard

Heather Frase, Veritech

AILuminate v1.0 Benchmark Launch

Prompts and Infrastructure

Shaona Ghosh, NVIDIA

AILuminate v1.0 Benchmark Launch

Evaluator Mechanism

Marisa Boston, Reins AI

AILuminate v1.0 Benchmark Launch

Use Cases

Sean McGregor, UL

AILuminate v1.0 Benchmark Launch

Integrity

All

AILuminate v1.0 Benchmark Launch

Lightning Talks Q&A

All

AILuminate v1.0 Benchmark Launch

AILuminate v1.0 Benchmark Launch Event – In Full

Peter Mattson, MLCommons President

AILuminate v1.0 Benchmark Launch

Overview of AILuminate v1.0 Benchmark

All

AILuminate v1.0 Benchmark Launch

AILuminate Panel Discussion