Data-centric ML

Accelerate research innovation and increase scientific rigor in machine learning by defining, developing, and operating benchmarks for datasets and data-centric algorithms, facilitated by a flexible ML benchmarking platform.

Connect with us:

Purpose


The Data-centric ML Research (DMLR) working group challenges existing ML benchmarking dogma by driving novel approaches to benchmark curation such as dynamic adversarial data collection. Benchmarks for machine learning solutions based on static datasets have well-known issues: they saturate quickly, are susceptible to overfitting, contain exploitable annotator artifacts and have unclear or imperfect evaluation metrics. This new paradigm of data-centric benchmarking is powered by the Dynabench platform. The key scientific question we investigate in this working group is: is it possible to make faster progress if data is collected dynamically, with humans and models in the loop, rather than in the old-fashioned static way? This further enables an ecosystem of other ML benchmarks in areas such as ML datasets and algorithms for working with datasets. We leverage these benchmarks through challenges and leaderboards.

Deliverables


DataPerf
  • Organize three workshops at major ML conferences per year
  • Recruit more participants to WG
  • Measure and improve quality of Common Crawl (based on human annotation)
  • Domains for impact challenges
    • LLMs + Commoncrawl
    • Safety
    • Science
  • Research challenges beyond the chosen domains
Dynabench
  • Supporting existing tasks, add three new tasks a year
    • A major task, such as Safety
  • Support academic research using Dynabench
  • Product improvements for LLM experiments
Shared objectives:
  • Come up with sustainable funding model for ongoing research
  • Develop approach to product management of Dynabench
  • Define standard processes and decision-making criteria around accepting new challenges
  • Establish standard processes for promoting new challenges and driving engagement
Meeting Schedule

2nd & 4th Thursdays every month 10:30-11:30AM Pacific.


How to Join and Access DMLR Working Group Resources 


DMLR Working Group Chairs

Chairs

To contact all DataPerf working group chairs email [email protected].

Lilith Bat-Leah 

Lilith Bat-Leah is Vice President, Data Services at Mod Op, responsible for consulting on use cases for data analytics, data science, and machine learning. Lilith has over 11 years of experience managing, delivering, and consulting on identification, preservation, collection, processing, review, annotation, analysis, and production of data in legal proceedings. She also has experience leading research and development of AI / machine learning software. She speaks and writes about various topics such as evaluation of machine learning systems, ESI protocols, and discovery of databases. Lilith holds a BSGS in Organization Behavior from Northwestern University, where she graduated magna cum laude. She formerly served as Co-Trustee of the EDRM Analytics and Machine Learning project, as a member of the EDRM Global Advisory Council, as Vice President of the Chicago ACEDS chapter, and as President of the New York Metro ACEDS Chapter.

Max Bartolo

Max leads the Command modelling team at Cohere working on improving adversarial robustness and the overall capabilities of large language models. He is also one of the original contributors to the Dynabench working group, which he currently co-leads, and he also lectures at UCL.

Praveen Paritosh

Praveen Paritosh is a senior research scientist at Google, leading research on data excellence and evaluation for AI systems. He designed the large-scale human curation systems for Freebase and the Google Knowledge Graph. He was the co-organizer and chair for the AAAI Rigorous Evaluation workshops, Crowdcamp 2016, SIGIRWebQA 2015 workshop, the Crowdsourcing at Scale 2013, the shared task challenge at HCOMP 2013, and Connecting Online Learning and Work at HCOMP 2014, CSCW 2015, and CHI 2016 toward the goal of galvanizing research at the intersection of crowdsourcing, natural language understanding, knowledge representation, and rigorous evaluations for artificial intelligence.