Resources

MLCommons is an open, transparent organization. This resources page provides supporting documentation for the AILuminate benchmark suite and will be continually updated.

AILuminate v1.0 benchmark supporting resources

MLCommons AILuminate Benchmark Assessment Standard

MLCommons AILuminate Technical Paper
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

UL Research Institutes Reliability Analysis

Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.

MLCommons AILuminate Benchmarking Policy

MLCommons AILuminate Messaging Guidelines