Skip to content

Commit

Permalink
Merge branch 'master' into guide/external-disclaimer
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeorpinel committed Mar 14, 2021
2 parents 4210bbf + 84093cd commit 06cf1da
Show file tree
Hide file tree
Showing 48 changed files with 1,889 additions and 280 deletions.
2 changes: 1 addition & 1 deletion content/blog/2021-02-22-cml-runner-prerelease.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ All the code to replicate this example is up on a
### Our favorite details

The new `cml-runner` function lets you turn on instances, including GPU,
high-memory and sport instances, and kick off a new workflow using the hardware
high-memory and spot instances, and kick off a new workflow using the hardware
and environment of your choice—and of course, it'll turn _off_ those instances
after a configurable timeout! In the first CML release, this took
[more than 30 lines of code](https://github.com/iterative/cml_cloud_case/blob/master/.github/workflows/cml.yaml)
Expand Down
714 changes: 714 additions & 0 deletions content/blog/2021-03-03-dvc-2-0-release.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion content/docs/api-reference/make_checkpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ implement the steps above yourself.

The stage definition in `dvc.yaml` should contain at least one
<abbr>output</abbr> with the `checkpoint: true` value set, so that DVC registers
its checkpoints.
its checkpoints. This is needed so that the experiment can later restart based
on that output's last <abbr>cached</abbr> state.

⚠️ Using the `checkpoint` field in `dvc.yaml` is only compatibly with
`dvc exp run`, `dvc repro` will abort if any stage contains it.
Expand Down
56 changes: 56 additions & 0 deletions content/docs/cml/cml-with-dvc.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,3 +135,59 @@ env:
```

</details>

## For GitHub Actions Users: Try the `setup-dvc` Action!

The [iterative/setup-dvc](https://github.com/iterative/setup-dvc) action is a
JavaScript action that sets up [DVC](https://dvc.org/) in your workflow.

### Usage

This action can be run on `ubuntu-latest`, `macos-latest`, `windows-latest`.
When running on `windows-latest`, Python 3 is a dependency that should be setup
first (and
[there's an action for that](https://github.com/actions/setup-python)).

Basic usage:

```yaml
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-dvc@v1
```

Windows:

```yaml
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.x'
- uses: iterative/setup-dvc@v1
```

A specific version can be pinned to your workflow using the `version` argument.

```yaml
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-dvc@v1
with:
version: '1.0.1'
```

### Inputs

The following inputs are supported.

- `version` - (optional) The version of DVC to install. A value of `latest` will
install the latest version of DVC. Defaults to `latest`.

### Outputs

Setup DVC has no outputs.
256 changes: 126 additions & 130 deletions content/docs/cml/self-hosted-runners.md
Original file line number Diff line number Diff line change
@@ -1,161 +1,157 @@
# Self-hosted Runners

GitHub Actions are run on GitHub-hosted runners by default. However, there are
many great reasons to use your own runners: to take advantage of GPUs; to
orchestrate your team's shared computing resources, or to train in the cloud.
GitHub Actions and GitLab CI are run on GitHub- and GitLab- hosted runners by
default. However, there are many great reasons to use your own runners: to take
advantage of GPUs; to orchestrate your team's shared computing resources, or to
train in the cloud.

☝️ **Tip!** Check out the
[official GitHub documentation](https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners)
to get started setting up your self-hosted runner.
☝️ **Tip!** Check out the official documentation from
[GitHub](https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners)
and [GitLab](https://docs.gitlab.com/runner/) to get started setting up your
self-hosted runner.

### Allocating cloud resources with CML
## Allocating cloud resources with CML

When a workflow requires computational resources (such as GPUs) CML can
automatically allocate cloud instances. For example, the following workflow
deploys a `t2.micro` instance on AWS EC2 and trains a model on the instance.
After the instance is idle for 120 seconds, it automatically shuts down.
automatically allocate cloud instances using `cml-runner`. You can spin up
instances on your AWS or Azure account (GCP support is forthcoming!).

For example, the following workflow deploys a `t2.micro` instance on AWS EC2 and
trains a model on the instance. After the job runs, the instance automatically
shuts down. You might notice that this workflow is quite similar to the
[basic use case](#usage) highlighted in the beginning of the docs- that's
because it is! What's new is that we've added `cml-runner`, plus a few
environmental variables for passing your cloud service credentials to the
workflow.

```yaml
name: train-my-model
name: "Train-in-the-cloud"
on: [push]

jobs:
deploy-cloud-runner:
deploy-runner:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml:latest
steps:
- name: deploy
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: "Deploy runner on EC2"
shell: bash
env:
repo_token: ${{ secrets.REPO_TOKEN }}
repo_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
echo "Deploying..."
MACHINE="cml$(date +%s)"
docker-machine create \
--driver amazonec2 \
--amazonec2-instance-type t2.micro \
--amazonec2-region us-east-1 \
--amazonec2-zone f \
--amazonec2-vpc-id vpc-06bc773d85a0a04f7 \
--amazonec2-ssh-user ubuntu \
$MACHINE
eval "$(docker-machine env --shell sh $MACHINE)"
(
docker-machine ssh \
$MACHINE "sudo mkdir -p /docker_machine && \
sudo chmod 777 /docker_machine" && \
docker-machine scp -r -q ~/.docker/machine \
$MACHINE:/docker_machine && \
docker run --name runner -d \
-v /docker_machine/machine:/root/.docker/machine \
-e RUNNER_IDLE_TIMEOUT=120 \
-e DOCKER_MACHINE=${MACHINE} \
-e RUNNER_LABELS=cml \
-e repo_token=$repo_token \
-e RUNNER_REPO="https://github.com/${GITHUB_REPOSITORY}" \
dvcorg/cml-py3 && \
sleep 20 && echo "Deployed $MACHINE"
) || (echo y | docker-machine rm $MACHINE && exit 1)
train:
needs: deploy-cloud-runner
runs-on: [self-hosted, cml]

cml-runner \
--cloud aws \
--cloud-region us-west \
--cloud-type=t2.micro \
--labels=cml-runner
name: model-training
needs: deploy-runner
runs-on: [self-hosted,cml-runner]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: cml_run
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
pip install -r requirements.txt
python train.py
cat metrics.txt >> report.md
cml-publish confusion_matrix.png --md >> report.md
cml-send-comment report.md
```
Please note that for GCP's Compute Engine, deploying the cloud runner involves
different steps:
```yaml
deploy-gce:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml:latest

steps:
- name: deploy
shell: bash
- uses: actions/checkout@v2
- name: "Train my model"
env:
repo_token: ${{ secrets.REPO_TOKEN }}
GOOGLE_APPLICATION_CREDENTIALS_DATA:
${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}
repo_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
run: |
echo "Deploying..."
echo '${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}' \
> gce-credentials.json
export GOOGLE_APPLICATION_CREDENTIALS='gce-credentials.json'
RUNNER_LABELS="gce"
RUNNER_REPO="https://github.com/${GITHUB_REPOSITORY}"
MACHINE="cml$(date +%s)"
docker-machine create --driver google \
--google-machine-type "n1-standard-4" \
--google-project "cml-project-279709" \
$MACHINE
eval "$(docker-machine env --shell sh $MACHINE)"
(
docker-machine ssh \
$MACHINE "sudo mkdir -p /docker_machine && \
sudo chmod 777 /docker_machine" && \
docker-machine scp -r -q ~/.docker/machine/ \
$MACHINE:/docker_machine && \
docker-machine scp -q gce-credentials.json \
$MACHINE:/docker_machine/gce-credentials.json && \
eval "$(docker-machine env --shell sh $MACHINE)" && \
docker run --name runner -d \
-v /docker_machine/gce-credentials.json:/gce-credentials.json \
-e GOOGLE_APPLICATION_CREDENTIALS='/gce-credentials.json' \
-v /docker_machine/machine:/root/.docker/machine \
-e DOCKER_MACHINE=$MACHINE \
-e repo_token=$repo_token \
-e RUNNER_LABELS=$RUNNER_LABELS \
-e RUNNER_REPO=$RUNNER_REPO \
-e RUNNER_IDLE_TIMEOUT=120 \
dvcorg/cml-py3 && \
sleep 20 && echo "Deployed $MACHINE"
) || (docker-machine rm -f $MACHINE && exit 1)
pip install -r requirements.txt
python train.py
# Publish report with CML
cat metrics.txt > report.md
cml-send-comment report.md
```
### Inputs
In the above workflow, the step `deploy-runner` launches an EC2 `t2-micro`
instance in the `us-west` region. The next step, `model-training`, runs on the
newly launched instance.

**Note that you can use any container with this workflow!** While you must have
CML and its dependencies setup to use CML functions like `cml-send-comment` from
your instance, you can create your favorite training environment in the cloud by
pulling the Docker container of your choice.

We like the
[CML container](https://github.com/iterative/cml/blob/master/docker/Dockerfile)
(`docker://dvcorg/cml-py3`) because it comes loaded with Python, CUDA, `git`,
`node` and other essentials for full-stack data science. But we don't mind if
you do it your way :)

## Arguments

The function `cml-runner` accepts the following arguments:

| Argument | Description | Values |
| --------------------- | ------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| `--version` | Show version number | Boolean |
| `--labels` | Comma delimited runner labels | Default: `cml` |
| `--idle-timeout ` | Time in seconds for the runner to be waiting for jobs before shutting down. 0 waits forever. | Default: `300` |
| `--name` | Name displayed in the repo once registered | Default: `cml-` followed by a unique identifier, i.e. `cml-cfwj9rrari` |
| `--driver` | If not specified, driver is inferred from environmental variables | Choices: `github`,`gitlab` |
| `--repo` | Specifies the Git repository to be used. If not specified, repo is inferred from the environmental variables. | Example: `https://github.com/iterative/cml` |
| `--token` | Personal access token to be used. If not specified, it will be inferred from the environment. | `token` should be a string |
| ` --cloud` | Cloud provider to deploy the runner | Choices: `aws`,`azure` |
| `--cloud-region` | Region where the instance is deployed. | choices: `us-east`,`us-west`, `eu-west`, `eu-north`. Also accepts native cloud regions. Defaults to `us-west`. |
| `--cloud-type` | Instance type. | Choices: `m`, `l`, `xl`. Also supports native types like i.e. `t2.micro` |
| `--cloud-gpu` | GPU type. | Choices: `nogpu`,`k80`,`tesla` |
| `--cloud-hdd-size` | HDD size in GB. | Accepts integer values |
| `--cloud-ssh-private` | Your private RSA SSH key. If not provided will be generated by the Terraform-provider-Iterative. | Accepts string |
| `--cloud-spot` | Request a spot instance | Boolean |
| `--cloud-spot-price` | Spot max price. If not specified it takes current spot bidding pricing. | default: `-1` |
| `-h` | Show help | Boolean |

## Environmental variables

You will need to
[create a new personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line),
`REPO_TOKEN`, with repository read/write access. `REPO_TOKEN` must be added as a
secret in your project repository.
[create a personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)
with repository read/write access and workflow privileges. In the example
workflow, this token is stored as `PERSONAL_ACCESS_TOKEN`.

Note that you will also need to provide access credentials for your cloud
compute resources as secrets. In the above example, `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY` are required to deploy EC2 instances.

### Provisioning cloud compute
Click below to see credentials needed for supported cloud service providers.

<details>

In the above example, we use
[Docker Machine](https://docs.docker.com/machine/concepts/) to provision
instances. Please see their documentation for further details.
### AWS

Note several CML-specific arguments to `docker run`:
```yaml
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }}
```

Note that `AWS_SESSION_TOKEN` is optional.

</details>

<details>

### Azure

```yaml
env:
AZURE_STORAGE_CONNECTION_STRING:
${{ secrets.AZURE_STORAGE_CONNECTION_STRING }}
AZURE_STORAGE_CONTAINER_NAME: ${{ secrets.AZURE_STORAGE_CONTAINER_NAME }}
```

</details>

### Using on-premise machines as self-hosted runners

You can also use the new `cml-runner` function to set up a local self-hosted
runner. On your local machine or on-premise GPU cluster, you'll install CML as a
package and then run:

```yaml
cml-runner \ --repo $your_project_repository_url \
--token=$personal_access_token \ --labels tf \ --idle-timeout 180
```

- `repo_token` should be set to your repository's personal access token
- `RUNNER_REPO` should be set to the URL of your project repository
- The docker container should be given as `dvcorg/cml`, `dvcorg/cml-py3`,
`dvc/org/cml-gpu`, or `dvcorg/cml-gpu-py3`
Now your machine will be listening for workflows from your project repository.
Loading

0 comments on commit 06cf1da

Please sign in to comment.