Resources
MLCommons is an open, transparent organization. This resources page provides AILuminate benchmark supporting documentation, and will be continually updated.
AILuminate v1.0 benchmark supporting resources
Coming Soon! AILuminate Technical Paper
UL Research Institutes Reliability Analysis
Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.
AILuminate v1.0 Benchmark Launch Recordings