MLCommons is an open, transparent organization. This resources page provides AILuminate benchmark supporting documentation, and will be continually updated.

AILuminate v1.0 benchmark supporting resources

Coming Soon! AILuminate Technical Paper

UL Research Institutes Reliability Analysis

Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.

AILuminate v1.0 Benchmark Launch Recordings