Merge branch 'master' into guide/external-disclaimer

iterative · Mar 14, 2021 · 06cf1da · 06cf1da
2 parents 4210bbf + 84093cd
commit 06cf1da
Show file tree

Hide file tree

Showing 48 changed files with 1,889 additions and 280 deletions.
diff --git a/content/blog/2021-02-22-cml-runner-prerelease.md b/content/blog/2021-02-22-cml-runner-prerelease.md
@@ -119,7 +119,7 @@ All the code to replicate this example is up on a
 ### Our favorite details
 
 The new `cml-runner` function lets you turn on instances, including GPU,
-high-memory and sport instances, and kick off a new workflow using the hardware
+high-memory and spot instances, and kick off a new workflow using the hardware
 and environment of your choice—and of course, it'll turn _off_ those instances
 after a configurable timeout! In the first CML release, this took
 [more than 30 lines of code](https://github.com/iterative/cml_cloud_case/blob/master/.github/workflows/cml.yaml)

diff --git a/content/blog/2021-03-03-dvc-2-0-release.md b/content/blog/2021-03-03-dvc-2-0-release.md
diff --git a/content/docs/api-reference/make_checkpoint.md b/content/docs/api-reference/make_checkpoint.md
@@ -41,7 +41,8 @@ implement the steps above yourself.
 
 The stage definition in `dvc.yaml` should contain at least one
 <abbr>output</abbr> with the `checkpoint: true` value set, so that DVC registers
-its checkpoints.
+its checkpoints. This is needed so that the experiment can later restart based
+on that output's last <abbr>cached</abbr> state.
 
 ⚠️ Using the `checkpoint` field in `dvc.yaml` is only compatibly with
 `dvc exp run`, `dvc repro` will abort if any stage contains it.

diff --git a/content/docs/cml/cml-with-dvc.md b/content/docs/cml/cml-with-dvc.md
@@ -135,3 +135,59 @@ env:
 ```
 
 </details>
+
+## For GitHub Actions Users: Try the `setup-dvc` Action!
+
+The [iterative/setup-dvc](https://github.com/iterative/setup-dvc) action is a
+JavaScript action that sets up [DVC](https://dvc.org/) in your workflow.
+
+### Usage
+
+This action can be run on `ubuntu-latest`, `macos-latest`, `windows-latest`.
+When running on `windows-latest`, Python 3 is a dependency that should be setup
+first (and
+[there's an action for that](https://github.com/actions/setup-python)).
+
+Basic usage:
+
+```yaml
+steps:
+  - uses: actions/checkout@v2
+
+  - uses: iterative/setup-dvc@v1
+```
+
+Windows:
+
+```yaml
+steps:
+  - uses: actions/checkout@v2
+
+  - uses: actions/setup-python@v2
+    with:
+      python-version: '3.x'
+
+  - uses: iterative/setup-dvc@v1
+```
+
+A specific version can be pinned to your workflow using the `version` argument.
+
+```yaml
+steps:
+  - uses: actions/checkout@v2
+
+  - uses: iterative/setup-dvc@v1
+    with:
+      version: '1.0.1'
+```
+
+### Inputs
+
+The following inputs are supported.
+
+- `version` - (optional) The version of DVC to install. A value of `latest` will
+  install the latest version of DVC. Defaults to `latest`.
+
+### Outputs
+
+Setup DVC has no outputs.
diff --git a/content/docs/cml/self-hosted-runners.md b/content/docs/cml/self-hosted-runners.md
@@ -1,161 +1,157 @@
 # Self-hosted Runners
 
-GitHub Actions are run on GitHub-hosted runners by default. However, there are
-many great reasons to use your own runners: to take advantage of GPUs; to
-orchestrate your team's shared computing resources, or to train in the cloud.
+GitHub Actions and GitLab CI are run on GitHub- and GitLab- hosted runners by
+default. However, there are many great reasons to use your own runners: to take
+advantage of GPUs; to orchestrate your team's shared computing resources, or to
+train in the cloud.
 
-☝️ **Tip!** Check out the
-[official GitHub documentation](https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners)
-to get started setting up your self-hosted runner.
+☝️ **Tip!** Check out the official documentation from
+[GitHub](https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners)
+and [GitLab](https://docs.gitlab.com/runner/) to get started setting up your
+self-hosted runner.
 
-### Allocating cloud resources with CML
+## Allocating cloud resources with CML
 
 When a workflow requires computational resources (such as GPUs) CML can
-automatically allocate cloud instances. For example, the following workflow
-deploys a `t2.micro` instance on AWS EC2 and trains a model on the instance.
-After the instance is idle for 120 seconds, it automatically shuts down.
+automatically allocate cloud instances using `cml-runner`. You can spin up
+instances on your AWS or Azure account (GCP support is forthcoming!).
+
+For example, the following workflow deploys a `t2.micro` instance on AWS EC2 and
+trains a model on the instance. After the job runs, the instance automatically
+shuts down. You might notice that this workflow is quite similar to the
+[basic use case](#usage) highlighted in the beginning of the docs- that's
+because it is! What's new is that we've added `cml-runner`, plus a few
+environmental variables for passing your cloud service credentials to the
+workflow.
 
 ```yaml
-name: train-my-model
+name: "Train-in-the-cloud"
 on: [push]
+
 jobs:
-  deploy-cloud-runner:
+  deploy-runner:
     runs-on: [ubuntu-latest]
-    container: docker://dvcorg/cml:latest
     steps:
-      - name: deploy
+      - uses: iterative/setup-cml@v1
+      - uses: actions/checkout@v2
+      - name: "Deploy runner on EC2"
+        shell: bash
         env:
-          repo_token: ${{ secrets.REPO_TOKEN }}
+          repo_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
           AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
           AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
         run: |
-          echo "Deploying..."
-
-          MACHINE="cml$(date +%s)"
-          docker-machine create \
-              --driver amazonec2 \
-              --amazonec2-instance-type t2.micro \
-              --amazonec2-region us-east-1 \
-              --amazonec2-zone f \
-              --amazonec2-vpc-id vpc-06bc773d85a0a04f7 \
-              --amazonec2-ssh-user ubuntu \
-              $MACHINE
-
-          eval "$(docker-machine env --shell sh $MACHINE)"
-
-          (
-          docker-machine ssh \
-            $MACHINE "sudo mkdir -p /docker_machine && \
-            sudo chmod 777 /docker_machine" && \
-          docker-machine scp -r -q ~/.docker/machine \ 
-            $MACHINE:/docker_machine && \
-
-          docker run --name runner -d \
-            -v /docker_machine/machine:/root/.docker/machine \
-            -e RUNNER_IDLE_TIMEOUT=120 \
-            -e DOCKER_MACHINE=${MACHINE} \
-            -e RUNNER_LABELS=cml \
-            -e repo_token=$repo_token \
-            -e RUNNER_REPO="https://github.com/${GITHUB_REPOSITORY}" \
-           dvcorg/cml-py3 && \
-
-          sleep 20 && echo "Deployed $MACHINE"
-          ) || (echo y | docker-machine rm $MACHINE && exit 1)
-  train:
-    needs: deploy-cloud-runner
-    runs-on: [self-hosted, cml]
-
+          cml-runner \
+          --cloud aws \
+          --cloud-region us-west \
+          --cloud-type=t2.micro \
+          --labels=cml-runner
+  name: model-training
+    needs: deploy-runner
+    runs-on: [self-hosted,cml-runner]
+    container: docker://dvcorg/cml-py3:latest
     steps:
-      - uses: actions/checkout@v2
-      - name: cml_run
-        env:
-          repo_token: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          pip install -r requirements.txt
-          python train.py
-
-          cat metrics.txt >> report.md
-          cml-publish confusion_matrix.png --md >> report.md
-          cml-send-comment report.md
-```
-
-Please note that for GCP's Compute Engine, deploying the cloud runner involves
-different steps:
-
-```yaml
-deploy-gce:
-  runs-on: [ubuntu-latest]
-  container: docker://dvcorg/cml:latest
-
-  steps:
-    - name: deploy
-      shell: bash
+    - uses: actions/checkout@v2
+    - name: "Train my model"
       env:
-        repo_token: ${{ secrets.REPO_TOKEN }}
-        GOOGLE_APPLICATION_CREDENTIALS_DATA:
-          ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}
+        repo_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
       run: |
-        echo "Deploying..."
-
-        echo '${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}' \
-          > gce-credentials.json
-        export GOOGLE_APPLICATION_CREDENTIALS='gce-credentials.json'
-
-        RUNNER_LABELS="gce"
-        RUNNER_REPO="https://github.com/${GITHUB_REPOSITORY}"
-        MACHINE="cml$(date +%s)"
-
-        docker-machine create --driver google \
-          --google-machine-type "n1-standard-4" \
-          --google-project "cml-project-279709" \
-          $MACHINE
-
-        eval "$(docker-machine env --shell sh $MACHINE)"
-
-        (
-        docker-machine ssh \
-          $MACHINE "sudo mkdir -p /docker_machine && \
-          sudo chmod 777 /docker_machine" && \
-        docker-machine scp -r -q ~/.docker/machine/ \   
-          $MACHINE:/docker_machine && \
-        docker-machine scp -q gce-credentials.json \
-          $MACHINE:/docker_machine/gce-credentials.json && \
-
-        eval "$(docker-machine env --shell sh $MACHINE)" && \
-        docker run --name runner -d \
-          -v /docker_machine/gce-credentials.json:/gce-credentials.json \
-          -e GOOGLE_APPLICATION_CREDENTIALS='/gce-credentials.json' \
-          -v /docker_machine/machine:/root/.docker/machine \
-          -e DOCKER_MACHINE=$MACHINE \
-          -e repo_token=$repo_token \
-          -e RUNNER_LABELS=$RUNNER_LABELS \
-          -e RUNNER_REPO=$RUNNER_REPO \
-          -e RUNNER_IDLE_TIMEOUT=120 \
-          dvcorg/cml-py3 && \
-        sleep 20 && echo "Deployed $MACHINE"
-        ) || (docker-machine rm -f $MACHINE && exit 1)
+        pip install -r requirements.txt
+        python train.py
+
+        # Publish report with CML
+        cat metrics.txt > report.md
+        cml-send-comment report.md
 ```
 
-### Inputs
+In the above workflow, the step `deploy-runner` launches an EC2 `t2-micro`
+instance in the `us-west` region. The next step, `model-training`, runs on the
+newly launched instance.
+
+**Note that you can use any container with this workflow!** While you must have
+CML and its dependencies setup to use CML functions like `cml-send-comment` from
+your instance, you can create your favorite training environment in the cloud by
+pulling the Docker container of your choice.
+
+We like the
+[CML container](https://github.com/iterative/cml/blob/master/docker/Dockerfile)
+(`docker://dvcorg/cml-py3`) because it comes loaded with Python, CUDA, `git`,
+`node` and other essentials for full-stack data science. But we don't mind if
+you do it your way :)
+
+## Arguments
+
+The function `cml-runner` accepts the following arguments:
+
+| Argument              | Description                                                                                                   | Values                                                                                                         |
+| --------------------- | ------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
+| `--version`           | Show version number                                                                                           | Boolean                                                                                                        |
+| `--labels`            | Comma delimited runner labels                                                                                 | Default: `cml`                                                                                                 |
+| `--idle-timeout `     | Time in seconds for the runner to be waiting for jobs before shutting down. 0 waits forever.                  | Default: `300`                                                                                                 |
+| `--name`              | Name displayed in the repo once registered                                                                    | Default: `cml-` followed by a unique identifier, i.e. `cml-cfwj9rrari`                                         |
+| `--driver`            | If not specified, driver is inferred from environmental variables                                             | Choices: `github`,`gitlab`                                                                                     |
+| `--repo`              | Specifies the Git repository to be used. If not specified, repo is inferred from the environmental variables. | Example: `https://github.com/iterative/cml`                                                                    |
+| `--token`             | Personal access token to be used. If not specified, it will be inferred from the environment.                 | `token` should be a string                                                                                     |
+| ` --cloud`            | Cloud provider to deploy the runner                                                                           | Choices: `aws`,`azure`                                                                                         |
+| `--cloud-region`      | Region where the instance is deployed.                                                                        | choices: `us-east`,`us-west`, `eu-west`, `eu-north`. Also accepts native cloud regions. Defaults to `us-west`. |
+| `--cloud-type`        | Instance type.                                                                                                | Choices: `m`, `l`, `xl`. Also supports native types like i.e. `t2.micro`                                       |
+| `--cloud-gpu`         | GPU type.                                                                                                     | Choices: `nogpu`,`k80`,`tesla`                                                                                 |
+| `--cloud-hdd-size`    | HDD size in GB.                                                                                               | Accepts integer values                                                                                         |
+| `--cloud-ssh-private` | Your private RSA SSH key. If not provided will be generated by the Terraform-provider-Iterative.              | Accepts string                                                                                                 |
+| `--cloud-spot`        | Request a spot instance                                                                                       | Boolean                                                                                                        |
+| `--cloud-spot-price`  | Spot max price. If not specified it takes current spot bidding pricing.                                       | default: `-1`                                                                                                  |
+| `-h`                  | Show help                                                                                                     | Boolean                                                                                                        |
+
+## Environmental variables
 
 You will need to
-[create a new personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line),
-`REPO_TOKEN`, with repository read/write access. `REPO_TOKEN` must be added as a
-secret in your project repository.
+[create a personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)
+with repository read/write access and workflow privileges. In the example
+workflow, this token is stored as `PERSONAL_ACCESS_TOKEN`.
 
 Note that you will also need to provide access credentials for your cloud
 compute resources as secrets. In the above example, `AWS_ACCESS_KEY_ID` and
 `AWS_SECRET_ACCESS_KEY` are required to deploy EC2 instances.
 
-### Provisioning cloud compute
+Click below to see credentials needed for supported cloud service providers.
+
+<details>
 
-In the above example, we use
-[Docker Machine](https://docs.docker.com/machine/concepts/) to provision
-instances. Please see their documentation for further details.
+### AWS
 
-Note several CML-specific arguments to `docker run`:
+```yaml
+env:
+  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+  AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }}
+```
+
+Note that `AWS_SESSION_TOKEN` is optional.
+
+</details>
+
+<details>
+
+### Azure
+
+```yaml
+env:
+  AZURE_STORAGE_CONNECTION_STRING:
+    ${{ secrets.AZURE_STORAGE_CONNECTION_STRING }}
+  AZURE_STORAGE_CONTAINER_NAME: ${{ secrets.AZURE_STORAGE_CONTAINER_NAME }}
+```
+
+</details>
+
+### Using on-premise machines as self-hosted runners
+
+You can also use the new `cml-runner` function to set up a local self-hosted
+runner. On your local machine or on-premise GPU cluster, you'll install CML as a
+package and then run:
+
+```yaml
+cml-runner \ --repo $your_project_repository_url \
+--token=$personal_access_token \ --labels tf \ --idle-timeout 180
+```
 
-- `repo_token` should be set to your repository's personal access token
-- `RUNNER_REPO` should be set to the URL of your project repository
-- The docker container should be given as `dvcorg/cml`, `dvcorg/cml-py3`,
-  `dvc/org/cml-gpu`, or `dvcorg/cml-gpu-py3`
+Now your machine will be listening for workflows from your project repository.