Today the MLPerf™ consortium released results for MLPerf Inference v0.7, the second round of submissions to their machine learning inference performance benchmark suite that measures how quickly a trained neural network can process new data for a wide range of applications on a variety of form factors.

MLPerf Inference v0.7 is an exciting milestone for the ML community. The second benchmark round more than doubles the number of applications in the suite and introduces a new dedicated set of MLPerf Mobile benchmarks along with a publically available smartphone application. The Inference v0.7 benchmark suite has been incredibly popular with 23 submitting organizations and over 1,200 peer-reviewed results – twice as many as the first round – for systems ranging from smartphones to data center servers. Additionally, this round introduces randomized third party audits for rules compliance. To see the results, go to mlcommons.org/en/inference-datacenter-07/ and mlcommons.org/en/inference-edge-07/.

The MLPerf Inference v0.7 suite includes four new benchmarks for data center and edge systems:

  • BERT: Bi-directional Encoder Representation from Transformers (BERT) fine tuned for question answering using the SQuAD 1.1 data set. Given a question input, the BERT language model predicts and generates an answer. This task is representative of a broad class of natural language processing workloads.
  • DLRM: Deep Learning Recommendation Model (DLRM) is a personalization and recommendation model that is trained to optimize click-through rates (CTR). Common examples include recommendation for online shopping, search results, and social media content ranking.
  • 3D U-Net: The 3D U-Net architecture is trained on the BraTS 2019 dataset for brain tumor segmentation. The network identifies whether each voxel within a 3D MRI scan belongs to a healthy tissue or a particular brain abnormality (i.e. GD-enhancing tumor, peritumoral edema, necrotic and non-enhancing tumor core), and is representative of many medical imaging tasks.
  • RNN-T: Recurrent Neural Network Transducer is an automatic speech recognition (ASR) model that is trained on a subset of LibriSpeech. Given a sequence of speech input, it predicts the corresponding text. RNN-T is representative of widely used speech-to-text systems.

MLPerf Mobile – A New Open and Community-driven Industry Standard

The second inference round also introduces MLPerf Mobile, the first open and transparent set of benchmarks for mobile machine learning. MLPerf Mobile targets client systems with well-defined and relatively homogeneous form factors and characteristics such as smartphones, tablets, and notebooks. The MLPerf Mobile working group, led by Arm, Google, Intel, MediaTek, Qualcomm Technologies, and Samsung Electronics, selected four new neural networks for benchmarking and developed a smartphone application. The four new benchmarks are available in the TensorFlow, TensorFlow Lite, and ONNX formats, and include:

  • MobileNetEdgeTPU: This an image classification benchmark that is considered the most ubiquitous task in computer vision. This model deploys the MobileNetEdgeTPU feature extractor which is optimized with neural architecture search to have low latency and high accuracy when deployed on mobile AI accelerators. This model classifies input images with 224 x 224 resolution into 1000 different categories.
  • SSD-MobileNetV2: Single Shot multibox Detection (SSD) with MobileNetv2 feature extractor is an object detection model trained to detect 80 different object categories in input frames with 300×300 resolution. This network is commonly used to identify and track people/objects for photography and live videos.
  • DeepLabv3+ MobileNetV2: This is an image semantic segmentation benchmark. This model is a convolutional neural network that deploys MobileNetV2 as the feature extractor, and uses the Deeplabv3+ decoder for pixel-level labeling of 31 different classes in input frames with 512 x 512 resolution. This task can be deployed for scene understanding and many computational photography applications.
  • MobileBERT: The MobileBERT model is a mobile-optimized variant of the larger BERT model that is fine-tuned for question answering using the SQuAD 1.1 data set. Given a question input, the MobileBERT language model predicts and generates an answer. This task is representative of a broad class of natural language processing workloads.

“The MLPerf Mobile app is extremely flexible and can work on a wide variety of smartphone platforms, using different computational resources such as CPU, GPUs, DSPs, and dedicated accelerators,” stated Prof. Vijay Janapa Reddi from Harvard University and Chair of the MLPerf Mobile working group. The app comes with built-in support for TensorFlow Lite, providing CPU, GPU, and NNAPI (on Android) inference backends, and also supports alternative inference engines through vendor-specific SDKs. The MLPerf Mobile application will be available for download on multiple operating systems in the near future, so that consumers across the world can measure the performance of their own smartphones.

Additional information about the Inference v0.7 benchmarks will be available at mlcommons.org/en/inference-datacenter-07/ and mlcommons.org/en/inference-edge-07/.