MLPerf™ is a trademark of MLCommons®. If you use it and refer to MLPerf results, you must follow the results guidelines. MLCommons reserves the right to solely determine if uses of its trademark are appropriate.
This benchmark suite measures how fast storage systems can supply training data when a model is being trained. Below is a short summary of the workloads and metrics from the latest round of benchmark results submissions.
Each workload supported by MLPerf Storage is defined by a corresponding MLPerf Training benchmark. The following table summarizes the workloads in this version of the benchmark (the rules remain the official source of truth):
|Area||Task||Model||Nominal Dataset (see below)|
|Vision||Medical image segmentation||3D UNET||KITS 2019 (602x512x512)|
|Language||Language processing||BERT-large||SQuAD v1.1 (max_seq_len=384)|
The dataset is referred to as a “nominal dataset” above because the MLPerf Storage benchmark simulates the above named real datasets using synthetically generated populations of files where the distribution of the size of the files matches the distribution in the real dataset. The size of the dataset used in each benchmark submission is automatically scaled to a size that prevents significant caching of the dataset in the systems actually running the benchmark code.
MLPerf Storage aims to encourage innovation in the architectures of storage systems that support AI/ML projects. It has two Divisions with different purposes:
- The Closed division is intended to allow comparisons between storage systems in an “apples-to-apples” fashion and requires using a fixed set of benchmark tunables and options when running the benchmark.
- The Open division is intended to foster innovation, to show how performance could be increased if some changes were made. As a result, it allows using different data storage formats, access methods, tunables, and options.
See the rules for specifics on what can be changed in each Division.
MLPerf divides benchmark results into categories based on the availability of the storage solution:
- Available on premise - shows results for systems that are available in a customer datacenter.
- Available via the ALCF Discretionary Allocation Program - shows results for systems that are available in the Argonne National Laboratory Discretionary Allocation Program.
- Research, Development, or Internal (RDI) - contains experimental, in development, or internal-use hardware or software.
Each row in the results table is a set of results produced by a single submitter that all use the same software stack and hardware platform in a single deployment method. Each row in the Closed and the Open division tables contains the following information:
- Submitter: The name of the organization that submitted the results.
- System Name: A general description or name of the system under test.
- System Type: One of several categories that characterize the architecture, eg: local storage, parallel filesystem, software defined storage, etc.
- Total Usable Capacity: The amount of user data the solution can store.
- Storage Protocol & Software: The technique the compute node(s) used to access the storage system (eg: which standard protocol or proprietary software) and the name and version of the software running the storage system.
- Hardware: A rough overview of the hardware used by the storage solution - at a minimum, the number of “storage controllers” and the type or technology of storage drives used by the solution.
- Networking: A rough overview of the networking used by the storage solution to connect the compute node(s) to the storage system - at a minimum, the type and speed of the links.
- Number of Compute Nodes: The number of compute nodes that ran the benchmark and accessed the storage.
- Simulated Accelerator Type: The vendor and model number of accelerator that the benchmark was simulating during this test.
Each row in the results table contains the following information for each workload submitted:
- Throughput: This is the maximum performance the storage system was able to deliver while maintaining all the accelerator(s) at 90% utilization or above (ie: no more than 10% of the time were the accelerator(s) idle and waiting for the storage system to deliver data). It is reported as both “samples/second”, a metric that should be intuitively valuable to AI/ML practitioners, and as “MB/s”, a metric that should be intuitively valuable to storage practitioners.
- Number of Simulated Accelerators: The number of simulated accelerators active during this test; ie: how many accelerators of the given type can this storage system keep busy.
- Dataset Size: Since the dataset used in this test was synthesized and must be of a size to prevent significant caching of data in the compute node(s) running the benchmark, the size of the dataset used in this test is reported here.
The rules for the current submissions round are available here.
The source code for the benchmark is available.