Skip to content

Commit

Permalink
more README reworking
Browse files Browse the repository at this point in the history
  • Loading branch information
casperdcl committed Jun 28, 2021
1 parent 9091556 commit 7892077
Showing 1 changed file with 46 additions and 45 deletions.
91 changes: 46 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ CML principles:
- **Auto reports for ML experiments.** Auto-generate reports with metrics and
plots in each Git pull request. Rigorous engineering practices help your team
make informed, data-driven decisions.
- **No additional services.** Build your own ML platform using GitHub, GitLab,
- **No additional services.** Build your own ML platform using GitLab, GitHub,
or BitBucket. Optionally, use
[cloud storage](#configuring-cloud-storage-providers) as well as either
self-hosted or cloud runners (such as AWS EC2, Azure, or GCP). No databases,
Expand All @@ -42,13 +42,12 @@ for hands-on MLOps tutorials using CML!

## Table of contents

1. [Setup](#setup)
1. [Setup (GitLab, GitHub, BitBucket)](#setup)
2. [Usage](#usage)
3. [Getting started (tutorial)](#getting-started)
4. [Using CML with DVC](#using-cml-with-dvc)
5. [Using self-hosted runners](#using-self-hosted-runners)
6. [Install CML as a package](#install-cml-as-a-package)
7. [Example Projects](#see-also)
5. [Advanced Setup (Self-hosted, local package)](#advanced-setup)
6. [Example projects](#see-also)

## Setup

Expand Down Expand Up @@ -119,21 +118,21 @@ and CML set up on an Ubuntu LTS base with CUDA libraries and

### CML Functions

CML provides a number of helper functions to help package the outputs of ML
workflows (including numeric data and visualizations about model performance)
into a CML report.
CML provides a number of functions to help package the outputs of ML workflows
(including numeric data and visualizations about model performance) into a CML
report.

Below is a table of CML functions for writing markdown reports and delivering
those reports to your CI system (GitHub Actions or GitLab CI).

| Function | Description | Inputs |
| ----------------------- | -------------------------------------------------------------- | ----------------------------------------------------------- |
| `cml-runner` | Starts a runner locally or in cloud providers | See [Arguments](https://github.com/iterative/cml#arguments) |
| `cml-publish` | Publish an image for writing to CML report. | `<path to image> --title <image title> --md` |
| `cml-send-comment` | Return CML report as a comment in your GitHub/GitLab workflow. | `<path to report> --head-sha <sha>` |
| `cml-send-github-check` | Return CML report as a check in GitHub | `<path to report> --head-sha <sha>` |
| `cml-pr` | Create a pull request. | TODO |
| `cml-tensorboard-dev` | Return a link to a Tensorboard.dev page | `--logdir <path to logs> --title <experiment title> --md` |
| Function | Description | Example Inputs |
| ----------------------- | ---------------------------------------------------------------- | ----------------------------------------------------------- |
| `cml-runner` | Launch a runner locally or hosted by a cloud provider | See [Arguments](https://github.com/iterative/cml#arguments) |
| `cml-publish` | Publicly host an image for displaying in a CML report | `<path to image> --title <image title> --md` |
| `cml-send-comment` | Return CML report as a comment in your GitHub/GitLab workflow | `<path to report> --head-sha <sha>` |
| `cml-send-github-check` | Return CML report as a check in GitHub | `<path to report> --head-sha <sha>` |
| `cml-pr` | Commit the given files to a new branch and create a pull request | `<path>...` |
| `cml-tensorboard-dev` | Return a link to a Tensorboard.dev page | `--logdir <path to logs> --title <experiment title> --md` |

#### CML Reports

Expand All @@ -160,7 +159,7 @@ report. For example, if `graph.png` is output by `python train.py`, run:
cml-publish graph.png --md >> report.md
```

## Getting Started
### Getting Started

1. Fork our
[example project repository](https://github.com/iterative/example_cml).
Expand Down Expand Up @@ -214,13 +213,13 @@ git add . && git commit -m "modify forest depth"
git push origin experiment
```

5. In GitHub, open up a Pull Request to compare the `experiment` branch to
5. In GitHub, open up a pull request to compare the `experiment` branch to
`master`.

![](https://static.iterative.ai/img/cml/make_pr.png)

Shortly, you should see a comment from `github-actions` appear in the Pull
Request with your CML report. This is a result of the `cml-send-comment`
Shortly, you should see a comment from `github-actions` appear in the pull
request with your CML report. This is a result of the `cml-send-comment`
function in your workflow.

![](https://static.iterative.ai/img/cml/first_report.png)
Expand All @@ -236,7 +235,7 @@ performance metrics and visualizations — in GitHub checks and comments. What
kind of workflow you want to run, and want to put in your CML report, is up to
you.

## Using CML with DVC
### Using CML with DVC

In many ML projects, data isn't stored in a Git repository, but needs to be
downloaded from external sources. [DVC](https://dvc.org) is a common way to
Expand Down Expand Up @@ -291,7 +290,7 @@ jobs:
> :warning: If you're using DVC with cloud storage, take note of environment
> variables for your storage format.

### Configuring Cloud Storage Providers
#### Configuring Cloud Storage Providers

There are many
[supported could storage providers](https://dvc.org/doc/command-reference/remote/modify#available-parameters-per-storage-type).
Expand Down Expand Up @@ -378,7 +377,9 @@ env:

</details>

## Setup: Self-hosted Runners
## Advanced Setup

### Self-hosted Runners

GitHub Actions are run on GitHub-hosted runners by default. However, there are
many great reasons to use your own runners: to take advantage of GPUs; to
Expand All @@ -389,7 +390,7 @@ data.
> [official GitHub documentation](https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners)
> to get started setting up your own self-hosted runner.

### Allocating cloud resources with CML
#### Allocating Cloud Compute Resources with CML

When a workflow requires computational resources (such as GPUs), CML can
automatically allocate cloud instances using `cml-runner`. You can spin up
Expand Down Expand Up @@ -422,8 +423,8 @@ jobs:
cml-runner \
--cloud aws \
--cloud-region us-west \
--cloud-type=t2.micro \
--labels=cml-runner
--cloud-type t2.micro \
--labels cml-runner
model-training:
needs: [deploy-runner]
runs-on: [self-hosted, cml-runner]
Expand All @@ -446,10 +447,10 @@ instance in the `us-west` region. The `model-training` step then runs on the
newly-launched instance.

> :tada: **Note that you can use any container with this workflow!** While you
> must [have CML and its dependencies set up](#install-cml-as-a-package) to use
> functions such `cml-send-comment` from your instance, you can create your
> favourite training environment in the cloud by pulling the Docker container of
> your choice.
> must [have CML and its dependencies set up](#local-package) to use functions
> such `cml-send-comment` from your instance, you can create your favourite
> training environment in the cloud by pulling the Docker container of your
> choice.

We like the CML container (`docker://dvcorg/cml`) because it comes loaded with
Python, CUDA, `git`, `node` and other essentials for full-stack data science.
Expand All @@ -464,7 +465,7 @@ image tags. The tag convention is `{CML_VER}-dvc{DVC_VER}-base{BASE_VER}{-gpu}`:
For example, `docker://dvcorg/cml:0-dvc2-base1-gpu`, or
`docker://ghcr.io/iterative/cml:0-dvc2-base1`.

### Arguments
#### Arguments

The `cml-runner` function accepts the following arguments:

Expand Down Expand Up @@ -519,10 +520,10 @@ Options:
-h Show help [boolean]
```

### Environment variables
#### Environment Variables

> :warning: You will need to
> [create a personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)
> [create a personal access token (PAT)](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)
> with repository read/write access and workflow privileges. In the example
> workflow, this token is stored as `PERSONAL_ACCESS_TOKEN`.

Expand All @@ -533,24 +534,23 @@ compute resources as secrets. In the above example, `AWS_ACCESS_KEY_ID` and
Please see our docs about
[configuring cloud storage providers](#configuring-cloud-storage-providers).

### On-premise (local) runners
#### On-premise (Local) Runners

This means using on-premise machines as self-hosted runners. The `cml-runner`
function is used to set up a local self-hosted runner. On your local machine or
on-premise GPU cluster, [install CML as a package](#install-cml-as-a-package)
and then run:
on-premise GPU cluster, [install CML as a package](#local-package) and then run:

```bash
cml-runner \
--repo $your_project_repository_url \
--token=$PERSONAL_ACCESS_TOKEN \
--token $PERSONAL_ACCESS_TOKEN \
--labels tf \
--idle-timeout 180
```

Now your machine will be listening for workflows from your project repository.

## Install CML as a package
### Local Package

In the examples above, CML is installed by the `setup-cml` action, or comes
pre-installed in a custom Docker image pulled by a CI runner. You can also
Expand All @@ -572,21 +572,19 @@ npm install -g vega-cli vega-lite
CML and Vega-Lite package installation require the NodeJS package manager
(`npm`) which ships with NodeJS. Installation instructions are below.

### Install NodeJS in GitHub
#### Install NodeJS

This is probably not necessary when using GitHub's default containers or one of
CML's Docker containers. Self-hosted runners may need to use a set up action to
install NodeJS:
- **GitHub**: This is probably not necessary when using GitHub's default
containers or one of CML's Docker containers. Self-hosted runners may need to
use a set up action to install NodeJS:

```bash
uses: actions/setup-node@v2
with:
node-version: '12'
```

### Install NodeJS in GitLab

GitLab requires direct installation of NodeJS:
- **GitLab**: Requires direct installation.

```bash
curl -sL https://deb.nodesource.com/setup_12.x | bash
Expand All @@ -602,4 +600,7 @@ These are some example projects using CML.
- [CML with DVC to pull data](https://github.com/iterative/cml_dvc_case)
- [CML with Tensorboard](https://github.com/iterative/cml_tensorboard_case)
- [CML with a small EC2 instance](https://github.com/iterative/cml-runner-base-case)
:key:
- [CML with EC2 GPU](https://github.com/iterative/cml_cloud_case)

:key: needs a PAT (see [Environment variables](#environment-variables)).

0 comments on commit 7892077

Please sign in to comment.