MLCommons® Science Working Group Benchmark Submission Rules

Table of Contents

1. Difference in the General Submission Rules
2. Overview of the Rolling Submission Process
3. Submission and Result Repositories
4. Timeline of Deployment
5. Logging Libraries
6. Directory Structure
7. References

1. Difference in the General Submission Rules

The general MLCommons® submission rules are summarized in

https://github.com/mlcommons/policies/edit/master/submission_rules.adoc

This document lists the differences between the MLCommons Science Group Submissions. The differences are summarized by

Rolling submission with review
The directory structure for the coordinated storing of results as required by MLCommons
Augmentation of the code to produce valid mllog files

All other requirements are the same as discussed in the General Submission Rules Document

2. Overview of the Rolling Submission Process

The following goals are to be captured by the submission process

Submissions can be made to the MLCommons® Science GitHub at any time for any benchmark that has been released.
Submissions will be checked and then reviewed by the working group which has a review committee for each benchmark.
Depending on the number of scientific innovations in the submission, the review time will vary.
The submitters will get an acknowledgment of the submission and a customized response from the committee within a week of the submittal date.
This second response will indicate the estimated time for a committee review to be completed.
On completion of the committee review, all submissions that are considered in scope will be posted on the working group GitHub which includes a scientific discovery "leaderboard" for each benchmark. Updates will be summarized quarterly
The innovations will be described and can include aspects other than final accuracy e.g. the submission might need a smaller dataset to achieve an interesting accuracy. It is expected that benchmarks will be posted for at least a year to gather a rich set of input.

3. Submission and Result Repositories

https://github.com/mlcommons/submissions_science : A private repository in which the science WG members are maintainers and coordinate the submission for the result. To upload by non working group membere you need to contact the WG through the Mlcommons Science Google Group (https://groups.google.com/a/mlcommons.org/g/science) mailing list.
https://github.com/mlcommons/science_results : The public repository in which we post the results.

4. Timeline of Deployment

The submission is open and can be performed at any time.

5. Logging Libraries

Augmentation of codes for consideration into the inclusion of the science benchmarks must use the

MLCommons® Logging Library

An alternative library that internally produces MLCommons® events for logging is the

StopWatch from cloudmesh-common
Quickstart for using Cloudmesh StopWatch for MLcommons

This library has the advantage of generating a human-readable summary table in addition to the MLCommons® log events.

6. Directory Structure

Note: The original directory structure rule Section 13 will be removed from the document

https://github.com/mlcommons/science/blob/main/policy.adoc#directory-structure-for-submission

and placed here.

In this section, we document the directory structure for submissions. We introduce the following variables denoted by { } around the Variable name. The brackets [ ] are used to donate a list

{organization} ::= The organization submitting the benchmark

{application} ::= The application, a value from [cloudmask,earthquake,uno,stemdl]

{system} ::= Defines the system used for this benchmark

{descriptor} ::= A unique descriptor of the experiment, as described in the sientific_contribution. For example experiment-1, faultzone-3.

{n} ::= number of repeated experiments

All results are stored in a directory such as

{organization}/{application}/{system}/{descriptor}/

Within this directory, all parameters for that experiment are stored, so that all information for the experiment is self-contained within the experiment.

This includes

A number of scripts that are used to run the particular benchmark on the specified system to allow reproducibility.
result-n.txt ::= The result logs for the n-th run with the parameters defined by config.yaml
config.yaml ::= A configuration file that contains all hyperparameters and other parameters to define a run. This configuration file contains an entry that uniquely describes the version of the code that is run. The version must be included in the MLCommons benchmark repository. This also includes all hyperparameters including new ones particular to this approach. The configuration file should include enough details to replicate the experiment with the locations of the program and the data. If the data does not fit in a GitHub repository it can be placed in a publicly accessible data store. Its location needs to be specified as an endpoint in the YAML file, with a command line example on how to retrieve it. As multiple files could be needed a list of commands can be specified. Examples of the configuration format in the YAML file are:
```
 github:
   discription: EarthQuake Prediction
   repo: https://mlcommons.github.com/...
   branch: main
   version: 1.0
   tag: 1.0
data:
- aws s3 rsync ....
```
sientific_contribution.pdf ::= A detailed description of the scientific contribution and the algorithms and associated hyperparameters used in the benchmark.
A README.md file that describes how to run it. The README.md must have sufficient information to create such runs. In some cases, a program may be used to run multiple experiments and create such a directory automatically. Enough information must be included in the directory, so such parameterized runs can be conducted, while also replicating the appropriate directory structure. The reason we require for each result its own subdirectory is to allow output notebooks and comments to be submitted for each of the results if needed. This is especially the case when jupyter notebooks are used as the benchmark to be executed, allowing the notebook with all its cells to be submitted along the results.txt file.
Log File requirements.
The log file must have am Organization Records in Mllog entry format. This includes mllog entries for POINT_IN_TIME with the values
- submission_benchmark
- submission_org
- submission_division
- submissiom_version
- submission_github_commit_version
- submission_status
- submission_platform
The submission division for science is open and must be elected in the submission_division filed. Currently we have a number of benchmarks defined by the codes for cloudmask, earthquake, stemdl, and uno contained in the science benchmark repository.
The version in github VERSION.txt file used for the benchmark needs to be added to the submission log record. The version is included in a VERSION.txt file withon the benchamrk and is hardcoded in the program. In addition the GitHub commit version needs to be added to the program. You can optain that version while being in a code repository from the commandline with git rev-parse HEAD
Scientific Result. Each benchmark must have an mllog entry POINT_IN_TIME with the key "result" and the value of a dict describing the result format and meaning. The result must be documented in detail in the sientific_contribution.pdf file.
Uploading Results

The results are presently managed in Github.

You will need to create a fork, and commit within the fork your own results in the appropriate benchmark directories. Results for each benchmark are for open division only. Placeholder directories for various benchmarks are included in these directories. You will need to place your benchmark in the appropriate directory. Once committed to your fork, you can create a pull request which will then be reviewed.

If you have issues with the submission or need help. Please contact the mlcommons science working group via the Google group.

7. References

We included here a list of supporting and related documents

[1] Overview presentation of the MLScience Group Barrett, Wahid Bhimji, Bala Desinghu, Murali Emani, Geoffrey Fox, Grigori Fursin, Tony Hey, David Kanter, Christine Kirkpatrick,Hai Ah Nam, Juri Papay, Amit Ruhela, Mallikarjun Shankar, Jeyan Thiyagalingam Aristeidis Tsaris, Gregor von Laszewski, Feiyi Wang, Junqi Yin , MLCommons® Community Meeting, (also available in Google docs), December 9 2021.
[2] AI Benchmarking for Science: Efforts from the MLCommons® Science Working Group, Jeyan Thiyagalingam, Gregor von Laszewski, Junqi Yin, Murali Emani, Juri Papay, Gregg Barrett, Piotr Luszczek, Aristeidis Tsaris, Christine Kirkpatrick, Feiyi Wang, Tom Gibbs, Venkatram Vishwanath, Mallikarjun Shankar, Geoffrey Fox, Tony Hey, June 2022
[3] MLCommons® Policies
[4] MLCommons® Training policies
[4] MLCommons® Interference Policies
[6] MLCommons® submission Rules for training and inference
[7] MLCommons® Science GitHub Repository
[8] Science Development GitHub Repository to prepare release candidates for the MLCommons® repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submit.adoc

submit.adoc

MLCommons® Science Working Group Benchmark Submission Rules

1. Difference in the General Submission Rules

2. Overview of the Rolling Submission Process

3. Submission and Result Repositories

4. Timeline of Deployment

5. Logging Libraries

6. Directory Structure

7. References

Files

submit.adoc

Latest commit

History

submit.adoc

File metadata and controls

MLCommons® Science Working Group Benchmark Submission Rules

1. Difference in the General Submission Rules

2. Overview of the Rolling Submission Process

3. Submission and Result Repositories

4. Timeline of Deployment

5. Logging Libraries

6. Directory Structure

7. References