MLCommons

Research Working Group

DataPerf Working Group

Mission

Drive innovation in ML datasets by defining, developing, and operating benchmarks for datasets and data-centric algorithms.

Purpose

We are building DataPerf, a benchmark suite for ML datasets and algorithms for working with datasets. Historically, ML research has focused primarily on models, and simply used the largest existing dataset for common ML tasks without considering the dataset’s breadth, difficulty, and fidelity to the underlying problem. This under-focus on data has led to a range of issues, from data cascades in real applications, to saturation of existing dataset-driven benchmarks for model quality impeding research progress. In order to catalyze increased research focus on data quality and foster data excellence, we created DataPerf: a suite of benchmarks that evaluate the quality of training and test data, and the algorithms for constructing or optimizing such datasets, such as core set selection or labeling error debugging, across a range of common ML tasks such as image classification. We leverage the DataPerf benchmarks through challenges and leaderboards.

Deliverables

Data benchmarking roadmap
Data benchmarking rules
Data benchmarking evaluation harnesses
Data benchmarking reference implementations
Leaderboards and challenges on an online platform

Meeting Schedule

Weekly on Thursday from 12:00-12:30pm Pacific.

Mailing List

dataperf@mlcommons.org

Working Group Resources

Google Drive (Members only)

Working Group Chair Emails

Newsha Ardalani new@fb.com

Praveen Paritosh pkp@google.com

Working Group Chair Bios

Newsha Ardalani is a Research Scientist at Facebook AI Research (FAIR), working on three thrusts of data: data scalability, data perishability and data valuation, and their implications on large-scale AI system design. She received her Ph.D. from UW-Madison in 2016.

LinkedIn

Praveen Paritosh is a senior research scientist at Google, leading research on data excellence and evaluation for AI systems. He designed the large-scale human curation systems for Freebase and the Google Knowledge Graph. He was the co-organizer and chair for the AAAI Rigorous Evaluation workshops, Crowdcamp 2016, SIGIRWebQA 2015 workshop, the Crowdsourcing at Scale 2013, the shared task challenge at HCOMP 2013, and Connecting Online Learning and Work at HCOMP 2014, CSCW 2015, and CHI 2016 toward the goal of galvanizing research at the intersection of crowdsourcing, natural language understanding, knowledge representation, and rigorous evaluations for artificial intelligence.

LinkedIn