Benchmark Suite Results

MLPerf Storage

The MLPerf Storage benchmark suite measures how fast storage systems can supply training data when a model is being trained. Below is a short summary of the workloads and metrics from the latest round of benchmark results submissions. 


Results

MLCommons results are shown in an interactive table to enable you to explore the results. You can apply filters to see just the information you want and click across the top tabs to view the results visually. To see all result details, expand the columns by clicking on the “+” icon, which appears when you hover over “System Name” and subsequent columns.


Workloads

Each workload supported by MLPerf Storage is defined by a corresponding MLPerf Training benchmark. The following table summarizes the workloads in this version of the benchmark (the rules remain the official source of truth): 

AreaTaskModelNominal Dataset Latest Version Available
VisionMedical image segmentation3D U-NetKITS 2019 (602x512x512)v1.0
VisionImage classificationResNet50ImageNetv1.0
ScientificCosmology parameter predictionCosmoFlowCosmoFlow N-body simulationv1.0
LanguageLanguage processingBERT-largeWikipedia (2.5KB/sample)v0.5

The dataset is referred to as a “nominal dataset” above because the MLPerf Storage benchmark simulates the above named real datasets using synthetically generated populations of files where the distribution of the size of the files matches the distribution in the real dataset. The size of the dataset used in each benchmark submission is automatically scaled to a size that prevents significant caching of the dataset in the systems actually running the benchmark code. 

Divisions

MLPerf aims to encourage innovation in software as well as hardware by allowing submitters to reimplement the reference implementations. There are two Divisions that allow different levels of flexibility during reimplementation:

  • The Closed division is intended to allow comparisons between storage systems in an “apples-to-apples” fashion and requires using a fixed set of benchmark tunables and options when running the benchmark. 
  • The Open division is intended to foster innovation, to show how performance could be increased if some changes were made. As a result, it allows using different data storage formats, access methods, tunables, and options. 
  • See the rules for specifics on what can be changed in each Division. 

Availability

MLPerf divides benchmark results into categories based on the availability of the storage solution: 

  • Available on premise – shows results for systems that are available in a customer datacenter. 
  • Available via the ALCF Discretionary Allocation Program – shows results for systems that are available in the Argonne National Laboratory Discretionary Allocation Program. 
  • Research, Development, or Internal (RDI) – contains experimental, in development, or internal-use hardware or software. 

Submission Information

Each row in the results table is a set of results produced by a single submitter  using the same software stack and hardware platform. Each Closed division row contains the following information:

  • Submitter

    The organization that submitted the results.

  • Storage Protocol & Software

    The technique the compute node(s) used to access the storage system (eg: which standard protocol or proprietary software) and the name and version of the software running the storage system.

  • System Name

    A general description or name of the system under test.

  • Hardware

    A rough overview of the hardware used by the storage solution – at a minimum, the number of “storage controllers” and the type or technology of storage drives used by the solution.

  • System Type

    One of several categories that characterize the architecture, eg: local storage, parallel filesystem, software defined storage, etc.

  • Networking

    A rough overview of the networking used by the storage solution to connect the compute node(s) to the storage system – at a minimum, the type and speed of the links.

  • Total Usable Capacity

    The amount of user data the solution can store.

  • Number of Compute Nodes

    The number of compute nodes that ran the benchmark and accessed the storage.

  • Simulated Accelerator Type

    The vendor and model number of accelerator that the benchmark was simulating during this test.

Each row in the results table contains the following information for each workload submitted: 

Throughput

This is the maximum performance the storage system was able to deliver while maintaining all the accelerator(s) at 90% utilization or above (ie: no more than 10% of the time were the accelerator(s) idle and waiting for the storage system to deliver data). It is reported as both “samples/second”, a metric that should be intuitively valuable to AI/ML practitioners, and as “MB/s”, a metric that should be intuitively valuable to storage practitioners.

Number of Simulated Accelerators

The number of simulated accelerators active during this test; ie: how many accelerators of the given type can this storage system keep busy.

Dataset Size

Since the dataset used in this test was synthesized and must be of a size to prevent significant caching of data in the compute node(s) running the benchmark, the size of the dataset used in this test is reported here.