Resources
MLCommons is an open, transparent organization. This resources page provides AILuminate benchmark supporting documentation and will be continually updated.
AILuminate v1.0 benchmark supporting resources
Coming Soon! AILuminate Technical Paper
UL Research Institutes Reliability Analysis
Participant observers in the AI Risk and Reliability Working Group collaborated with researchers from the UL Research Institutes to develop a benchmark of benchmark reliability. Now being expanded to additional LLM benchmarks, the ‘Benchmarks’ Benchmark’ (B2) is built to advance the science of LLM benchmarking while identifying those benchmarks operated to the highest standards of reliability. More information on B2 is available with the Digital Safety Research Institute of the UL Research Institutes.
AILuminate v1.0 benchmark launch event talks
Lightning Talks
Eleanora Presani, Meta
AILuminate v1.0 Benchmark Launch
Assessment Standard
Heather Frase, Veritech
AILuminate v1.0 Benchmark Launch
Prompts and Infrastructure
Shaona Ghosh, NVIDIA
AILuminate v1.0 Benchmark Launch
Evaluator Mechanism
Marisa Boston, Reins AI
AILuminate v1.0 Benchmark Launch
Use Cases
Sean McGregor, UL
AILuminate v1.0 Benchmark Launch
Integrity
All
AILuminate v1.0 Benchmark Launch
Lightning Talks Q&A
All
AILuminate v1.0 Benchmark Launch
AILuminate v1.0 Benchmark Launch Event – In Full
Peter Mattson, MLCommons President
AILuminate v1.0 Benchmark Launch
Overview of AILuminate v1.0 Benchmark
All
AILuminate v1.0 Benchmark Launch