Training: HPC
Overview
This benchmark suite measures how fast systems can process inputs and produce results using a trained model. Below is a short summary of the current benchmarks and metrics. Please see the MLPerf Inference benchmark paper for a detailed description of the motivation and guiding principles behind the benchmark suite.
Scenarios and Metrics
In order to enable representative testing of a wide variety of inference platforms and use cases, MLPerf has defined four different scenarios as described below. A given scenario is evaluated by a standard load generator generating inference requests in a particular pattern and measuring a specific metric.
iframe
v2.0 Results
MLPerf™ is a trademark of MLCommons®. If you use it and refer to MLPerf results, you must follow the results guidelines. MLCommons reserves the right to solely determine if uses of its trademark are appropriate.
Duis aute irure dolor in reprehenderit
Tempus egestas sed sed risus pretium quam vulputate dignissim. Eu augue ut lectus arcu bibendum at varius vel pharetra. In ornare quam viverra orci sagittis eu volutpat odio facilisis. Iaculis at erat pellentesque adipiscing commodo elit. Risus nullam eget felis eget nunc lobortis mattis aliquam faucibus. Dictumst vestibulum rhoncus est pellentesque.
Single stream | LoadGen sends next query as soon as SUT completes the previous query | 1024 queries and 60 seconds | 1 | None | 90% | 90%-ile measured latency |
---|---|---|---|---|---|---|
Single stream | LoadGen sends next query as soon as SUT completes the previous query | 1024 queries and 60 seconds | 1 | None | 90% | 90%-ile measured latency |
Single stream | LoadGen sends next query as soon as SUT completes the previous query | 1024 queries and 60 seconds | 1 | None | 90% | 90%-ile measured latency |
Single stream | LoadGen sends next query as soon as SUT completes the previous query | 1024 queries and 60 seconds | 1 | None | 90% | 90%-ile measured latency |
Single stream | LoadGen sends next query as soon as SUT completes the previous query | 1024 queries and 60 seconds | 1 | None | 90% | 90%-ile measured latency |
Edge
Benchmarks
Each Edge benchmark requires the following scenarios, and sometimes permits an optional scenario:
Area | Benchmark | Dataset | Quality Target | Reference Implementation Model |
---|---|---|---|---|
Scientific | Climate segmentation | CAM5+TECA simulation | IOU 0.82 | DeepCAM |
Scientific | Climate segmentation | CAM5+TECA simulation | IOU 0.82 | DeepCAM |
Scientific | Climate segmentation | CAM5+TECA simulation | IOU 0.82 | DeepCAM |
Scientific | Climate segmentation | CAM5+TECA simulation | IOU 0.82 | DeepCAM |
Divisions
MLPerf aims to encourage innovation in software as well as hardware by allowing submitters to reimplement the reference implementations. MLPerf has two Divisions that allow different levels of flexibility during reimplementation. The Closed division is intended to compare hardware platforms or software frameworks “apples-to-apples” and requires using the same model as the reference implementation. The Open division is intended to foster innovation and allows using a different model or retraining.
Availability
MLPerf divides benchmark results into Categories based on availability.
- Available systems contain only components that are available for purchase or for rent in the cloud.
- Preview systems must be submittable as Available in the next submission round.
- Research, Development, or Internal (RDI) contain experimental, in development, or internal-use hardware or software.
Submission Information
Each row in the results table is a set of results produced by a single submitterusing the same software stack and hardware platform. Each Closed division row contains the following information:
Submitter
The organization that submitted the results.
Software
The ML framework and primary ML hardware library used.
System
General system description.
Benchmark Results
Results for each benchmark as described above.
Processor and count
The type and number of CPUs used, if CPUs perform the majority of ML compute.
Details
Link to metadata for submission.
Accelerator and count
The type and number of accelerators used, if accelerators perform the majority of ML compute.
Code
Link to code for submission.
Each Open division row may add the following information:
Model used
The model used to produce the results, which may or may not match the Closed Division requirement.
Notes
Arbitrary notes from submitter.
For results with power measurement, each row will add columns for
each benchmark containing the following:
System power (for Server and Offline scenarios), or
Energy per stream (for Single stream and Multiple stream scenarios)
Title Goes Here
These metrics are computed using the measured average AC power (energy) consumed by the entire system for the duration of the performance measurements of a benchmark (e.g., a single network under a single scenario); the AC power is measured at the wall.
The measured power is only valid for the accompanying benchmark. MLPerf Power is only capable of measuring and validating the full system power. Any other references to power in any description (e.g., a TDP configuration, a power supply rating) are not measured or validated by MLCommons.
Rules
The rules are here.
Reference implementations
The reference implementations for the benchmarks are here.