diff --git a/README.md b/README.md index eff3bf64fc..4a880608e9 100644 --- a/README.md +++ b/README.md @@ -5,28 +5,32 @@ [![GHA](https://img.shields.io/github/v/tag/iterative/setup-cml?label=GitHub%20Actions&logo=GitHub)](https://github.com/iterative/setup-cml) [![npm](https://img.shields.io/npm/v/@dvcorg/cml?logo=npm)](https://www.npmjs.com/package/@dvcorg/cml) -**What is CML?** Continuous Machine Learning (CML) is an open-source library for -implementing continuous integration & delivery (CI/CD) in machine learning -projects. Use it to automate parts of your development workflow, including model -training and evaluation, comparing ML experiments across your project history, -and monitoring changing datasets. +**What is CML?** Continuous Machine Learning (CML) is an open-source CLI tool +for implementing continuous integration & delivery (CI/CD) with a focus on +MLOps. Use it to automate parts of development workflows — including machine +provisioning; model training and evaluation; comparing ML experiments across +project history, and monitoring changing datasets. -![](https://static.iterative.ai/img/cml/github_cloud_case_lessshadow.png) _On -every pull request, CML helps you automatically train and evaluate models, then -generates a visual report with results and metrics. Above, an example report for -a [neural style transfer model](https://github.com/iterative/cml_cloud_case)._ +For example, on every pull request CML can help to automatically train and +evaluate models, then generate a visual report with results and metrics. -We built CML with these principles in mind: +![](https://static.iterative.ai/img/cml/github_cloud_case_lessshadow.png) _An +example report for a +[neural style transfer model](https://github.com/iterative/cml_cloud_case)._ + +CML principles: - **[GitFlow](https://nvie.com/posts/a-successful-git-branching-model/) for data science.** Use GitLab or GitHub to manage ML experiments, track who trained ML models or modified data and when. Codify data and models with [DVC](#using-cml-with-dvc) instead of pushing to a Git repo. - **Auto reports for ML experiments.** Auto-generate reports with metrics and - plots in each Git Pull Request. Rigorous engineering practices help your team + plots in each Git pull request. Rigorous engineering practices help your team make informed, data-driven decisions. -- **No additional services.** Build your own ML platform using just GitHub or - GitLab and your favourite cloud services: AWS, Azure, GCP. No databases, +- **No additional services.** Build your own ML platform using GitHub, GitLab, + or BitBucket. Optionally, use + [cloud storage](#configuring-cloud-storage-providers) as well as either + self-hosted or cloud runners (such as AWS EC2, Azure, or GCP). No databases, services or complex setup needed. :question: Need help? Just want to chat about continuous integration for ML? @@ -38,27 +42,38 @@ for hands-on MLOps tutorials using CML! ## Table of contents -1. [Usage](#usage) -2. [Getting started (tutorial)](#getting-started) -3. [Using CML with DVC](#using-cml-with-dvc) -4. [Using self-hosted runners](#using-self-hosted-runners) -5. [Install CML as a package](#install-cml-as-a-package) -6. [Example Projects](#see-also) +1. [Setup](#setup) +2. [Usage](#usage) +3. [Getting started (tutorial)](#getting-started) +4. [Using CML with DVC](#using-cml-with-dvc) +5. [Using self-hosted runners](#using-self-hosted-runners) +6. [Install CML as a package](#install-cml-as-a-package) +7. [Example Projects](#see-also) -## Usage +## Setup You'll need a GitHub or GitLab account to begin. Users may wish to familiarize themselves with [Github Actions](https://help.github.com/en/actions) or [GitLab CI/CD](https://about.gitlab.com/stages-devops-lifecycle/continuous-integration). Here, will discuss the GitHub use case. -- **GitLab users**: Please see our - [docs about configuring CML with GitLab](https://github.com/iterative/cml/wiki/CML-with-GitLab). -- **Bitbucket Cloud users**: Please see our - [docs on CML with Bitbucket Cloud](https://github.com/iterative/cml/wiki/CML-with-Bitbucket-Cloud). - _Bitbucket Server support estimated to arrive by May 2021._ -- **GitHub Actions users**: The key file in any CML project is - `.github/workflows/cml.yaml`: +### GitLab + +Please see our docs on +[CML with GitLab](https://github.com/iterative/cml/wiki/CML-with-GitLab) and in +particular the +[personal access token](https://github.com/iterative/cml/wiki/CML-with-GitLab#variables) +requirement. + +### Bitbucket Cloud + +Please see our docs on +[CML with Bitbucket Cloud](https://github.com/iterative/cml/wiki/CML-with-Bitbucket-Cloud). +_Bitbucket Server support estimated to arrive by mid 2021._ + +### GitHub Actions + +The key file in any CML project is `.github/workflows/cml.yaml`: ```yaml name: your-workflow-name @@ -92,6 +107,8 @@ jobs: cml-send-comment report.md ``` +## Usage + We helpfully provide CML and other useful libraries pre-installed on our [custom Docker images](https://github.com/iterative/cml/blob/master/Dockerfile). In the above example, uncommenting the field @@ -118,12 +135,13 @@ those reports to your CI system (GitHub Actions or GitLab CI). | `cml-pr` | Create a pull request. | TODO | | `cml-tensorboard-dev` | Return a link to a Tensorboard.dev page | `--logdir --title --md` | -### Customizing your CML report +#### CML Reports -CML reports are written in -[GitHub Flavored Markdown](https://github.github.com/gfm/). That means they can -contain images, tables, formatted text, HTML blocks, code snippets and more — -really, what you put in a CML report is up to you. Some examples: +The `cml-send-comment` command can be used to post reports. CML reports are +written in [GitHub Flavored Markdown](https://github.github.com/gfm/). That +means they can contain images, tables, formatted text, HTML blocks, code +snippets and more — really, what you put in a CML report is up to you. Some +examples: :spiral_notepad: **Text** Write to your report using whatever method you prefer. For example, copy the contents of a text file containing the results of ML model @@ -273,7 +291,11 @@ jobs: > :warning: If you're using DVC with cloud storage, take note of environment > variables for your storage format. -### Environment variables for supported cloud providers +### Configuring Cloud Storage Providers + +There are many +[supported could storage providers](https://dvc.org/doc/command-reference/remote/modify#available-parameters-per-storage-type). +Here are a few examples for some of the most frequently used providers:
@@ -356,7 +378,7 @@ env:
-## Using self-hosted runners +## Setup: Self-hosted Runners GitHub Actions are run on GitHub-hosted runners by default. However, there are many great reasons to use your own runners: to take advantage of GPUs; to @@ -509,7 +531,7 @@ compute resources as secrets. In the above example, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` are required to deploy EC2 instances. Please see our docs about -[environment variables needed to authenticate with supported cloud services](#environment-variables-for-supported-cloud-providers). +[configuring cloud storage providers](#configuring-cloud-storage-providers). ### On-premise (local) runners