Benchmarking and co-design are essential for driving optimization and innovation in Artificial Intelligence (AI) models, software, and next-generation hardware. Full-stack benchmarks, such as the MLPerf™ suite, play a crucial role in enabling fair comparisons across different software and hardware stacks on current systems.
The fast pace of AI innovation additionally demands an agile methodology to co-design future AI systems for training and inference by reproducing workload behavior in current production systems and extending the same to simulators and emulators used to design next-generation systems. This is often challenging as many critical workloads for cloud/hyperscalers and enterprise customers use proprietary code/models/architectures, which are difficult to share. This requires a standardized mechanism to define and share workloads without sharing the complete software stack, as well as ability to interpret, compare, and project behavior across the wide range of proprietary simulation and emulation tools.
Introducing Chakra
To meet this need, MLCommons® is developing Chakra — an open and interoperable graph-based representation for standardizing AI/Machine Learning (ML) workload execution traces. Chakra’s execution traces represent key operations, such as computation, memory accesses, and communication, along with control dependencies, timing, and resource constraints. Chakra also includes a complementary set of tools and capabilities to enable the collection, analysis, generation, and adoption of Chakra execution traces by a broad range of simulators, emulators, and replay tools.
“By standardizing the graph schema and fostering an ecosystem of open tools, we aim to develop a more agile methodology for reproducing workload behavior in current systems via trace replay, while simultaneously shaping the future of AI software and hardware co-design through simulators and emulators,” explains Srinivas Sridharan, Chakra Working Group Co-Chair and Meta Research Scientist.
“Chakra provides the flexibility to concentrate on specific aspects of workload optimization in isolation, such as compute, memory, or network, and study the direct impact of these optimizations on the original workload execution,” adds Tushar Krishna, Chakra Working Group Co-Chair and Associate Professor at Georgia Tech.
The Chakra Working Group is developing the following workstreams:
- Schema standardization
- Execution trace collection (from popular AI frameworks such as PyTorch, TensorFlow/JAX/XLA, etc.)
- Benchmark suite development and metrics. For instance as a first step we could provide an easier alternative to reproducing MLPerf results
- Chakra trace synthesis using ML models (for trace obfuscation and future workload projection)
- Support tools (e.g., analyzers, visualization, etc.)
- Downstream tools enablement (e.g., simulators, emulators, replay)
Join our effort
We invite others interested in contributing to this effort to join the Chakra Working Group. To sign up for the group/mailing list, receive the meeting invite, and access drive resources, request to join the Chakra Working Group.
For more information, contact the Chakra Working Group chairs Srinivas Sridharan (Meta) and Tushar Krishna (Georgia Tech) at [email protected].