guide: simplify Checkpoints (Exps) and

other misc. related changes Extracted from #3182
iterative · Jan 18, 2022 · 3e1a497 · 3e1a497
1 parent 062f8d5
commit 3e1a497
Show file tree

Hide file tree

Showing 3 changed files with 56 additions and 41 deletions.
diff --git a/content/docs/api-reference/make_checkpoint.md b/content/docs/api-reference/make_checkpoint.md
@@ -1,11 +1,14 @@
 # dvc.api.make_checkpoint()
 
-Make an [in-code checkpoint](/doc/user-guide/experiment-management/checkpoints).
+Make an in-code [checkpoint].
 
 ```py
 def make_checkpoint()
 ```
 
+[checkpoint]:
+  /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments
+
 #### Usage:
 
 ```py

diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md
@@ -2,24 +2,35 @@
 
 _New in DVC 2.0_
 
-To track successive steps in a longer experiment, you can register checkpoints
-from your code at runtime. This is especially helpful in machine learning, for
-example to track the progress in deep learning techniques such as evolving
-neural networks.
-
-_Checkpoint experiments_ track a series of variations (the checkpoints) and
-their execution can be stopped and resumed as needed. You interact with them
-using the `--rev` and `--reset` options of `dvc exp run` (see also the
-`checkpoint` field in `dvc.yaml` `outs`). They can help you
+To track successive steps in a longer machine learning experiment, you can
+register checkpoints from your code at runtime, for example to track the
+progress with deep learning techniques. They can help you
 
 - implement the best practice in deep learning to save your model weights as
   checkpoints.
 - track all code and data changes corresponding to the checkpoints.
 - see when metrics start diverging and revert to the optimal checkpoint.
 - automate the process of tracking every training epoch.
 
-> Experiments and checkpoints are [implemented](/blog/experiment-refs) with
-> hidden Git experiment commits branches.
+Checkpoint [execution] can be stopped and resumed as needed. You interact with
+them using the `--rev` and `--reset` options of `dvc exp run` (see also the
+`checkpoint` field in `dvc.yaml` `outs`).
+
+[execution]:
+  /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments
+
+<details>
+
+### ⚙️ How are checkpoints captured?
+
+Instead of a single reference like [regular experiments], checkpoint experiments
+have multiple commits under the custom Git reference (in `.git/refs/exps`),
+similar to a branch.
+
+[regular experiments]:
+  /doc/user-guide/experiment-management/experiments-overview
+
+</details>
 
 Like with regular experiments, checkpoints can become persistent by
 [committing them to Git](#committing-checkpoints-to-git).
@@ -62,38 +73,36 @@ running:
 $ pip install -r requirements.txt
 ```
 
-This will download all of the packages you need to run the example. Now you have
-everything you need to get started with experiments and checkpoints.
+This will download all of the packages you need to run the example.
+
+To initialize this project as a <abbr>DVC repository</abbr>, use `dvc init`. Now
+you have everything you need to get started with experiments and checkpoints.
 
 </details>
 
 ## Setting up a DVC pipeline
 
-DVC versions data and it also can version the ML model weights file as
-checkpoints during the training process. To enable this, you will need to set up
-a DVC pipeline to train your model.
-
-Adding a DVC pipeline only takes a few commands. At the root of the project,
-run:
-
-```dvc
-$ dvc init
-```
+DVC can version data as well as the ML model weights file in checkpoints during
+the training process. To enable this, you will need to set up a
+[DVC pipeline](/doc/start/data-pipelines) to train your model.
 
-This sets up the files you need for your DVC pipeline to work.
-
-Now we need to add a stage for training our model within a DVC pipeline. We'll
-do that with `dvc stage add`, which we'll explain more later. For now, run the
-following command:
+Now we need to add a training stage to `dvc.yaml` including `checkpoint: true`
+in its <abbr>output</abbr>. This tells DVC which <abbr>cached</abbr> output(s)
+to use to resume the experiment later (a circular dependency). We'll do this
+with `dvc stage add`.
 
 ```dvc
-$ dvc stage add --name train --deps data/MNIST --deps train.py \
-              --checkpoints model.pt --plots-no-cache predictions.json \
-              --params seed,lr,weight_decay --live dvclive python train.py
+$ dvc stage add --name train \
+                --deps data/MNIST --deps train.py \
+                --params seed,lr,weight_decay \
+                --checkpoints model.pt \
+                --plots-no-cache predictions.json \
+                --live dvclive \
+                python train.py
 ```
 
-The `--live dvclive` option enables our special logger [DVCLive](/doc/dvclive),
-which helps you register checkpoints from your code.
+💡 The `--live dvclive` option enables our special logger
+[DVCLive](/doc/dvclive), which helps you register checkpoints from code.
 
 The checkpoints need to be enabled in DVC at the pipeline level. The
 `-c / --checkpoint` option of the `dvc stage add` command defines the checkpoint
@@ -132,6 +141,9 @@ stages:
         html: true
 ```
 
+⚠️ Note that enabling checkpoints in a `dvc.yaml` file makes it incompatible
+with `dvc repro`.
+
 Before we go any further, this is a great point to add these changes to your Git
 history. You can do that with the following commands:
 

diff --git a/content/docs/user-guide/project-structure/pipelines-files.md b/content/docs/user-guide/project-structure/pipelines-files.md
@@ -381,13 +381,13 @@ validation and auto-completion.
 > These include a subset of the fields in `.dvc` file
 > [output entries](/doc/user-guide/project-structure/dvc-files#output-entries).
 
-| Field        | Description                                                                                                                                                                                                                                                                    |
-| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `cache`      | Whether or not this file or directory is <abbr>cached</abbr> (`true` by default). See the `--no-commit` option of `dvc add`.                                                                                                                                                   |
-| `remote`     | (Optional) name of the remote to use for pushing/fetching.                                                                                                                                                                                                                     |
-| `persist`    | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts                                                                                                                                     |
-| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [in-code checkpoints](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. |
-| `desc`       | (Optional) user description for this output. This doesn't affect any DVC operations.                                                                                                                                                                                           |
+| Field        | Description                                                                                                                                                                                                                                                                       |
+| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `cache`      | Whether or not this file or directory is <abbr>cached</abbr> (`true` by default). See the `--no-commit` option of `dvc add`.                                                                                                                                                      |
+| `remote`     | (Optional) name of the remote to use for pushing/fetching.                                                                                                                                                                                                                        |
+| `persist`    | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts                                                                                                                                        |
+| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. |
+| `desc`       | (Optional) user description for this output. This doesn't affect any DVC operations.                                                                                                                                                                                              |
 
 ⚠️ Note that using the `checkpoint` field in `dvc.yaml` is not compatible with
 `dvc repro`.