Research Working Group

DataPerf Working Group


Drive innovation in ML datasets by defining, developing, and operating benchmarks for datasets and data-centric algorithms.


We are building DataPerf, a benchmark suite for ML datasets and algorithms for working with datasets. Historically, ML research has focused primarily on models, and simply used the largest existing dataset for common ML tasks without considering the dataset’s breadth, difficulty, and fidelity to the underlying problem. This under-focus on data has led to a range of issues, from data cascades in real applications, to saturation of existing dataset-driven benchmarks for model quality impeding research progress. In order to catalyze increased research focus on data quality and foster data excellence, we created DataPerf: a suite of benchmarks that evaluate the quality of training and test data, and the algorithms for constructing or optimizing such datasets, such as core set selection or labeling error debugging, across a range of common ML tasks such as image classification. We leverage the DataPerf benchmarks through challenges and leaderboards.


  • Data benchmarking roadmap
  • Data benchmarking rules
  • Data benchmarking evaluation harnesses
  • Data benchmarking reference implementations
  • Leaderboards and challenges on an online platform

Meeting Schedule

Weekly on Thursday from 9:05-10:00AM Pacific.

How to Join and Access Working Group Resources

DataPerf Website

Visit the DataPerf website to learn more about the group's data-centric benchmarks.

Working Group Chairs

To contact all DataPerf working group chairs email

Lilith Bat-Leah ( - LinkedIn

Lilith Bat-Leah is Vice President, Data Services at Mod Op, responsible for consulting on use cases for data analytics, data science, and machine learning. Lilith has over 11 years of experience managing, delivering, and consulting on identification, preservation, collection, processing, review, annotation, analysis, and production of data in legal proceedings. She also has experience leading research and development of AI / machine learning software. She speaks and writes about various topics such as evaluation of machine learning systems, ESI protocols, and discovery of databases. Lilith holds a BSGS in Organization Behavior from Northwestern University, where she graduated magna cum laude. She formerly served as Co-Trustee of the EDRM Analytics and Machine Learning project, as a member of the EDRM Global Advisory Council, as Vice President of the Chicago ACEDS chapter, and as President of the New York Metro ACEDS Chapter.

Praveen Paritosh ( - LinkedIn

Praveen Paritosh is a senior research scientist at Google, leading research on data excellence and evaluation for AI systems. He designed the large-scale human curation systems for Freebase and the Google Knowledge Graph. He was the co-organizer and chair for the AAAI Rigorous Evaluation workshops, Crowdcamp 2016, SIGIRWebQA 2015 workshop, the Crowdsourcing at Scale 2013, the shared task challenge at HCOMP 2013, and Connecting Online Learning and Work at HCOMP 2014, CSCW 2015, and CHI 2016 toward the goal of galvanizing research at the intersection of crowdsourcing, natural language understanding, knowledge representation, and rigorous evaluations for artificial intelligence.