Skip to content

Latest commit

 

History

History
62 lines (47 loc) · 3.99 KB

introduction-cm.md

File metadata and controls

62 lines (47 loc) · 3.99 KB

[ Back to index ]

Introduction to the MLCommons CM language

During the past 10 years, the community has considerably improved the reproducibility of experimental results from research projects and published papers by introducing the artifact evaluation process with a unified artifact appendix and reproducibility checklists, Jupyter notebooks, containers, and Git repositories.

On the other hand, our experience reproducing more than 150 papers shows that it still takes weeks and months of painful and repetitive interactions between teams to reproduce experimental results. This effort includes decrypting numerous README files, examining ad-hoc artifacts and containers, and figuring out how to reproduce computational results. Furthermore, snapshot containers pose a challenge to optimize algorithms' performance, accuracy, power consumption and operational costs across diverse and rapidly evolving software, hardware, and data used in the real world.

This practical experience and the feedback from the community motivated us to establish the MLCommons Task Force on Automation and Reproducibility and develop a simple, technology agnostic, and English-like automation language called Collective Mind (MLCommons CM).

This language provides a universal interface to any software project and transforms it into a database of reusable automation actions and portable scripts in a transparent and non-intrusive way. Following FAIR principles, CM automation actions and scripts are simple wrappers around existing user scripts and artifacts to make them

  • findable via human-readable tags, aliases and unique IDs;
  • accessible via a unified CM CLI and Python API with JSON/YAML meta descriptions;
  • interoperable and portable across any software, hardware, models and data;
  • reusable across all projects.

CM is powered by Python, JSON and/or YAML meta descriptions, and a unified CLI to minimize the learning curve and help researchers and practitioners describe, share, and reproduce experimental results in a unified, portable, and automated way across any rapidly evolving software, hardware, and data while solving the "dependency hell" and automatically generating unified README files and modular containers.

Our ultimate goal is to use CM language to facilitate reproducible AI/ML Systems research, minimize manual and repetitive benchmarking and optimization efforts, and reduce time and costs when transferring technology to production across continuously changing software, hardware, models, and data.

See a few real-world examples of using the CM scripting language:

Read more about our long-term vision in our ACM REP'23 keynote "Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions".

Archive (previous CK version before MLCommons)