When the MLCommons Medical AI working group set out to develop MedPerf, an open framework for benchmarking medical AI in real-world private datasets, transparency and privacy were the main technical objectives. The current implementation of MedPerf ensures this transparency and integrity through a formal benchmark committee. The committee is a governance body which oversees the entire benchmark process, from committee formation to reference implementation development, dissemination and access of benchmark results, and execution of the benchmark. Critical to MedPerf is maintaining its federated evaluation approach to deliver a high level of data privacy; enforcing no transfer of medical data to perform the benchmarks.

Figure 1. Current MedPerf benchmark workflow. Red circle shows current policy implementation.
The working group, in collaboration with technical member organization Intel and academic organization Notre Dame is improving MedPerf with a new policy-enhanced workflow enforced by smart contracts. A smart contract is delivered via a trusted-execution-environment such as an Intel Software Guard Extension to ensure stronger transparency and privacy. The new workflow and smart contract protects digital assets by combining a confidential smart contract framework with private data objects by binding data policy enforcement with policy evaluation and isolated data task executions. This new approach proven through a proof-of-concept (POC) delivers a higher level of data transparency, accountability, and traceability.
The importance of policies to protect digital assets
Digital asset usage policy is of the utmost importance to protect intellectual property (IP) such as data and AI models. Unfortunately most existing solutions require copies of digital assets with access controls when sharing and rely on centralized governance for policy enforcement. In reality, this may actually reduce transparency and increase the risk of intellectual property theft or leakage.
This risk becomes more problematic in the healthcare space where data and AI models may contain personally identifiable information as well as very expensive IP. Transparency for medical data is critical to maintain traceability and accountability of various processes such as training, validation, and benchmarking. Any lack of transparency could be further magnified in decentralized settings such as data federations where assets are hosted and transferred in a decentralized manner.
Trusted execution environment-based confidential smart contract frameworks with private data objects have proven to help with policy enforcement by binding policy evaluation and isolated task executions to protect digital assets. Frameworks such as these can facilitate a high level of transparency, accountability, and traceability but to-date were largely unexplored in the medical AI space.
MedPerf smart contracts act as enforcement vehicles for healthcare dataset use policies
Current use policies for MedPerf datasets are implicitly hardcoded in the benchmark workflow. This means a data owner must execute any requested benchmark on their local data. This approach offers users full control of their data assets, but lacks any flexibility for customization and scalability such as decentralization and increasing the number of requests.
To enhance policies and provide broader flexibility and scalability in MedPerf, the working group built a trusted-execution-environment POC using a confidential smart contract framework. This approach delivers policy enforcement via a private data object by binding policy evaluation and isolated task executions to protect digital assets.
To initiate a smart contract, a MedPerf data owner creates a private data object contract that is linked to their original dataset through the dataset ID. The contract contains descriptive information on the dataset and acts as an interface to it. The data owner then securely deposits the dataset inside a digital-guardian-service trusted-execution-environment enclave. The enclave can be hosted on premise or on the cloud. When a benchmark of a model on the dataset is requested, the contract and the digital guardian perform bi-directional attestation to ensure each other’s integrity. This approach retains MedPerf’s federated evaluation ensuring the dataset never leaves and is always within the secure domain of the enclave while its use is bounded by policy enforcement.

Figure 2. Policy-enhanced benchmark workflow. Red circle shows smart-contract policy implementation.
Enhanced benchmark workflow benefits
The new policy-enhanced workflow provides more power to owners of digital assets such as healthcare datasets and models in benchmarking workflows. The POC has validated the use of embedded smart contract policies in MedPerf, and full digital asset owner control of their assets.
This new approach increases utility and potential rewards in benchmarks by supporting integrity enabling transparency in the execution cycle.
To learn more about how the policy is implemented, read the Technical Update.
The Medical AI working group calls on the broad community of academics, technologists, regulatory scientists, healthcare experts, and others to help with the roadmapping and contributions to impactful benchmarking technologies by joining the MLCommons Medical AI working group.