Skip to content

Commit

Permalink
guide: simplify Checkpoints intro and
Browse files Browse the repository at this point in the history
bring details about checkpoint tracking from exp run ref
  • Loading branch information
jorgeorpinel committed Jan 13, 2022
1 parent 8341712 commit fe46589
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 20 deletions.
9 changes: 0 additions & 9 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,6 @@ remain in the workspace.
Subsequent uses of `dvc exp run` will continue from the latest checkpoint (using
the latest cached versions of all outputs).

<details>

### ⚙️ How are checkpoints captured?

Instead of a single commit, checkpoint experiments have multiple commits under
the custom Git reference (in `.git/refs/exps`), similar to a branch.

</details>

List previous checkpoints with `dvc exp show`. To resume from a previous
checkpoint, you must first `dvc exp apply` it before using `dvc exp run`. For
`--queue` or `--temp` runs (see next section), use `--rev` instead to specify
Expand Down
33 changes: 22 additions & 11 deletions content/docs/user-guide/experiment-management/checkpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,35 @@

_New in DVC 2.0_

To track successive steps in a longer experiment, you can register checkpoints
from your code at runtime. This is especially helpful in machine learning, for
example to track the progress in deep learning techniques such as evolving
neural networks.

_Checkpoint experiments_ track a series of variations (the checkpoints) and
their execution can be stopped and resumed as needed. You interact with them
using the `--rev` and `--reset` options of `dvc exp run` (see also the
`checkpoint` field in `dvc.yaml` `outs`). They can help you
To track successive steps in a longer machine learning experiment, you can
register checkpoints from your code at runtime, for example to track the
progress with deep learning techniques. They can help you

- implement the best practice in deep learning to save your model weights as
checkpoints.
- track all code and data changes corresponding to the checkpoints.
- see when metrics start diverging and revert to the optimal checkpoint.
- automate the process of tracking every training epoch.

> Experiments and checkpoints are [implemented](/blog/experiment-refs) with
> hidden Git experiment commits branches.
Checkpoint [execution] can be stopped and resumed as needed. You interact with
them using the `--rev` and `--reset` options of `dvc exp run` (see also the
`checkpoint` field in `dvc.yaml` `outs`).

[execution]:
/doc/user-guide/experiment-management/running-experiments#checkpoint-experiments

<details>

### ⚙️ How are checkpoints captured?

Instead of a single reference like [regular experiments], checkpoint experiments
have multiple commits under the custom Git reference (in `.git/refs/exps`),
similar to a branch.

[regular experiments]:
/doc/user-guide/experiment-management/experiments-overview

</details>

Like with regular experiments, checkpoints can become persistent by
[committing them to Git](#committing-checkpoints-to-git).
Expand Down

0 comments on commit fe46589

Please sign in to comment.