The general MLCommons® submission rules are summarized in
This document lists the differences between the MLCommons Science Group Submissions. The differences are summarized by
-
Rolling submission with review
-
The directory structure for the coordinated storing of results as required by MLCommons
-
Augmentation of the code to produce valid mllog files
All other requirements are the same as discussed in the General Submission Rules Document
The following goals are to be captured by the submission process
-
Submissions can be made to the MLCommons® Science GitHub at any time for any benchmark that has been released.
-
Submissions will be checked and then reviewed by the working group which has a review committee for each benchmark.
-
Depending on the number of scientific innovations in the submission, the review time will vary.
-
The submitters will get an acknowledgment of the submission and a customized response from the committee within a week of the submittal date.
-
This second response will indicate the estimated time for a committee review to be completed.
-
On completion of the committee review, all submissions that are considered in scope will be posted on the working group GitHub which includes a scientific discovery "leaderboard" for each benchmark. Updates will be summarized quarterly
-
The innovations will be described and can include aspects other than final accuracy e.g. the submission might need a smaller dataset to achieve an interesting accuracy. It is expected that benchmarks will be posted for at least a year to gather a rich set of input.
-
https://github.com/mlcommons/submissions_science : A private repository in which the science WG members are maintainers and coordinate the submission for the result. To upload by non working group membere you need to contact the WG through the Mlcommons Science Google Group (https://groups.google.com/a/mlcommons.org/g/science) mailing list.
-
https://github.com/mlcommons/science_results : The public repository in which we post the results.
Augmentation of codes for consideration into the inclusion of the science benchmarks must use the
An alternative library that internally produces MLCommons® events for logging is the
This library has the advantage of generating a human-readable summary table in addition to the MLCommons® log events.
Note: The original directory structure rule Section 13 will be removed from the document
and placed here.
In this section, we document the directory structure for submissions. We introduce the following variables denoted by { }
around the Variable name. The brackets [ ]
are used to donate a list
{organization}
::= The organization submitting the benchmark
{application}
::= The application, a value from [cloudmask,earthquake,uno,stemdl]
{system}
::= Defines the system used for this benchmark
{descriptor}
::= A unique descriptor of the experiment, as described in the sientific_contribution. For example experiment-1
, faultzone-3
.
{n}
::= number of repeated experiments
All results are stored in a directory such as
{organization}/{application}/{system}/{descriptor}/
Within this directory, all parameters for that experiment are stored, so that all information for the experiment is self-contained within the experiment.
This includes
-
A number of scripts that are used to run the particular benchmark on the specified system to allow reproducibility.
-
result-n.txt
::= The result logs for then
-th run with the parameters defined byconfig.yaml
-
config.yaml
::= A configuration file that contains all hyperparameters and other parameters to define a run. This configuration file contains an entry that uniquely describes the version of the code that is run. The version must be included in the MLCommons benchmark repository. This also includes all hyperparameters including new ones particular to this approach. The configuration file should include enough details to replicate the experiment with the locations of the program and the data. If the data does not fit in a GitHub repository it can be placed in a publicly accessible data store. Its location needs to be specified as an endpoint in the YAML file, with a command line example on how to retrieve it. As multiple files could be needed a list of commands can be specified. Examples of the configuration format in the YAML file are:github: discription: EarthQuake Prediction repo: https://mlcommons.github.com/... branch: main version: 1.0 tag: 1.0 data: - aws s3 rsync ....
-
sientific_contribution.pdf
::= A detailed description of the scientific contribution and the algorithms and associated hyperparameters used in the benchmark. -
A
README.md
file that describes how to run it. TheREADME.md
must have sufficient information to create such runs. In some cases, a program may be used to run multiple experiments and create such a directory automatically. Enough information must be included in the directory, so such parameterized runs can be conducted, while also replicating the appropriate directory structure. The reason we require for each result its own subdirectory is to allow output notebooks and comments to be submitted for each of the results if needed. This is especially the case when jupyter notebooks are used as the benchmark to be executed, allowing the notebook with all its cells to be submitted along theresults.txt
file. -
Log File requirements.
-
The log file must have am Organization Records in Mllog entry format. This includes mllog entries for
POINT_IN_TIME
with the values-
submission_benchmark
-
submission_org
-
submission_division
-
submissiom_version
-
submission_github_commit_version
-
submission_status
-
submission_platform
-
-
The submission division for science is open and must be elected in the
submission_division
filed. Currently we have a number of benchmarks defined by the codes for cloudmask, earthquake, stemdl, and uno contained in the science benchmark repository. -
The version in github VERSION.txt file used for the benchmark needs to be added to the submission log record. The version is included in a VERSION.txt file withon the benchamrk and is hardcoded in the program. In addition the GitHub commit version needs to be added to the program. You can optain that version while being in a code repository from the commandline with
git rev-parse HEAD
-
Scientific Result. Each benchmark must have an mllog entry POINT_IN_TIME with the key "result" and the value of a dict describing the result format and meaning. The result must be documented in detail in the
sientific_contribution.pdf
file. -
Uploading Results
The results are presently managed in Github.
You will need to create a fork, and commit within the fork your own results in the appropriate benchmark directories. Results for each benchmark are for open division only. Placeholder directories for various benchmarks are included in these directories. You will need to place your benchmark in the appropriate directory. Once committed to your fork, you can create a pull request which will then be reviewed.
If you have issues with the submission or need help. Please contact the mlcommons science working group via the Google group.
We included here a list of supporting and related documents
-
[1] Overview presentation of the MLScience Group Barrett, Wahid Bhimji, Bala Desinghu, Murali Emani, Geoffrey Fox, Grigori Fursin, Tony Hey, David Kanter, Christine Kirkpatrick,Hai Ah Nam, Juri Papay, Amit Ruhela, Mallikarjun Shankar, Jeyan Thiyagalingam Aristeidis Tsaris, Gregor von Laszewski, Feiyi Wang, Junqi Yin , MLCommons® Community Meeting, (also available in Google docs), December 9 2021.
-
[2] AI Benchmarking for Science: Efforts from the MLCommons® Science Working Group, Jeyan Thiyagalingam, Gregor von Laszewski, Junqi Yin, Murali Emani, Juri Papay, Gregg Barrett, Piotr Luszczek, Aristeidis Tsaris, Christine Kirkpatrick, Feiyi Wang, Tom Gibbs, Venkatram Vishwanath, Mallikarjun Shankar, Geoffrey Fox, Tony Hey, June 2022
-
[8] Science Development GitHub Repository to prepare release candidates for the MLCommons® repository