Machine learning innovation to benefit everyone.
MLPerf Results Show Advances in Machine Learning Inference Performance and EfficiencyMLCommons’ latest benchmarks illustrate focus on energy efficiency and up to 3.3X performance gains
Multilingual Spoken Words Corpus - 50 Languages and Over 23 Million Audio Keyword ExamplesGiving the gift of voice to 5 billion people in 50 languages
Introducing the People’s Speech dataset - 30,000+ hours of diverse speech data to drive ML innovationBigger, better, and for everyone
MLCommons Unveils Open Datasets and Tools to Drive Democratization of Machine LearningEngineering consortium advances data-centric AI with pioneering datasets and tools
MLCommons aims to accelerate machine learning innovation to benefit everyone.
MLCommons aims to accelerate machine learning innovation to benefit everyone. Machine learning has tremendous potential to save lives in areas like healthcare and automotive safety and to improve information access and understanding through technologies like voice interfaces, automatic translation, and natural language processing. However, machine learning is completely unlike conventional software -- developers train an application rather than program it -- and requires a whole new set of techniques analogous to the breakthroughs in precision measurement, raw materials, and manufacturing that drove the industrial revolution.
MLCommons aims to answer the needs of the nascent machine learning industry through open, collaborative engineering in three areas:
Benchmarks provide consistent measurements of accuracy, speed, and efficiency. Consistent measurements enable engineers to design reliable products and services, and enable researchers to compare innovations and choose the best ideas to drive the solutions of tomorrow.
Datasets are the raw materials for all of machine learning. Models are only as good as the data they are trained on. Academics and entrepreneurs in particular depend on public datasets to create new technologies and new companies.
Best Practices empower researchers and engineers to more easily exchange models, reproduce experiments, and build applications that leverages machine learning. Improving best practices accelerates progress in, and grows the market for, machine learning.
The People’s Speech Dataset is among the world’s largest English speech recognition corpus today that is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0. It includes 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially is available with a permissive license. Just as ImageNet catalyzed machine learning for vision,the People’s Speech will unleash innovation in speech research and products that are available to users across the globe.
MLCube is a set of best practices for creating ML software that can just "plug-and-play" on many different systems. MLCube makes it easier for researchers to share innovative ML models, for a developer to experiment with many different models, and for software companies to create infrastructure for models. It creates opportunities by putting ML in the hands of more people. MLCube isn’t a new framework or service; MLCube is a consistent interface to machine learning models in containers like Docker. Models published with the MLCube interface can be run on local machines, on a variety of major clouds, or in Kubernetes clusters -- all using the same code.