Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: extract technical details from exp run to guides #3182

Merged
merged 23 commits into from
Jan 28, 2022
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
7cc3bc2
guide: link from exp init intro to -i example in ref.
jorgeorpinel Jan 12, 2022
30e9574
ref: simplify note about repro in exp run
jorgeorpinel Jan 12, 2022
ea1543b
ref: remove details about dvc exp codification
jorgeorpinel Jan 12, 2022
d8ba1f5
ref: remove motivation for using exp run
jorgeorpinel Jan 12, 2022
c9ab4b0
ref: remove emphasis on exp run --name
jorgeorpinel Jan 12, 2022
0d014b1
guide: bring exp show/diff/apply links from exp run ref
jorgeorpinel Jan 12, 2022
8341712
guide: bring details about clearing exps from exp run
jorgeorpinel Jan 13, 2022
fe46589
guide: simplify Checkpoints intro and
jorgeorpinel Jan 13, 2022
deb7723
ref: dissolve checkpoints details from exp run
jorgeorpinel Jan 13, 2022
f8113d2
ref: remove exp run --queue details and
jorgeorpinel Jan 13, 2022
d499eb9
ref: remove exp run --jobs details (parallel queue exec)
jorgeorpinel Jan 13, 2022
4c1b639
ref: format fixes in exp run
jorgeorpinel Jan 13, 2022
4aae871
ref: remove details on queued checkpoint exps from exp run Desc.
jorgeorpinel Jan 13, 2022
122e08f
ref: simplify example in exp run
jorgeorpinel Jan 13, 2022
379da9a
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jan 13, 2022
2e0613a
Merge branch 'master' into ref/exp
jorgeorpinel Jan 18, 2022
5d4551e
Merge branch 'ref/exp' of github.com:iterative/dvc.org into ref/exp
jorgeorpinel Jan 18, 2022
a4691a6
ref: describe all major features of exp run
jorgeorpinel Jan 18, 2022
084a6ed
checkpoints: remove "in-code" term
jorgeorpinel Jan 18, 2022
5e5dfd3
guide: consolidate pipeline-related info in Running
jorgeorpinel Jan 18, 2022
3239633
guide: compress Params section in Running Exps
jorgeorpinel Jan 18, 2022
df74ce9
guide: summarize the Exps Queue section of Running Exps
jorgeorpinel Jan 18, 2022
64eaf8f
guide: undo changes to Checkpoints guide
jorgeorpinel Jan 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 36 additions & 129 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# exp run

Run or resume an [experiment](/doc/command-reference/exp).
Run or resume a
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview).

## Synopsis

Expand All @@ -22,136 +23,43 @@ Provides a way to execute and track <abbr>experiments</abbr> in your
<abbr>project</abbr> without polluting it with unnecessary commits, branches,
directories, etc.

> `dvc exp run` is equivalent to `dvc repro` for experiments. It has the same
> behavior when it comes to `targets` and stage execution (restores the
> dependency graph, etc.). See the command [options](#options) for more on the
> differences.
> 📖 See full [Running Experiments] guide for more information.

Before running an experiment, you'll probably want to make modifications such as
data and code updates, or <abbr>hyperparameter</abbr> tuning. For the latter,
you can use the `--set-param` (`-S`) option of this command to change
`dvc param` values on-the fly.
<abbr>parameter</abbr> tuning. You can use the `--set-param` (`-S`) option to
change param values on-the fly.

Each experiment creates and tracks a project variation based on your
<abbr>workspace</abbr> changes. Experiments will have a unique, auto-generated
name like `exp-bfe64` by default, which can be customized using the `--name`
(`-n`) option.
Comment on lines -35 to -38
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots removed from this ref. (either moved to a guide, or it was already there), and some paragraphs got moved around. May be easier to review by seeing the resulting https://dvc-org-ref-exp-pmsxtvqwhk3xsn.herokuapp.com/doc/command-reference/exp/run .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: Per #3182 (comment) I reintroduced text that describes all major features of exp run to this ref.

> `dvc exp run` has the same behavior as `dvc repro` when it comes to `targets`
> and stage execution (restores the dependency graph, etc.). See the command
> [options](#options) for more on the differences.

<details>

### ⚙️ How does DVC track experiments?

Experiments are custom
[Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
(found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked
out by DVC). Note that these commits are not pushed to Git remotes by default
(see `dvc exp push`).

</details>

The results of the last `dvc exp run` can be seen in the workspace. To display
and compare multiple experiments, use `dvc exp show` or `dvc exp diff`
(`plots diff` also accepts experiment names as `revisions`). Use `dvc exp apply`
to restore the results of any other experiment instead.

Successful experiments can be made
[persistent](/doc/user-guide/experiment-management#persistent-experiments) by
committing them to the Git repo. Unnecessary ones can be removed with
`dvc exp remove`or `dvc exp gc` (or abandoned).

> Note that experiment data will remain in the <abbr>cache</abbr> until you use
> regular `dvc gc` to clean it up.

## Checkpoints
Successful experiments can be [made persistent] by committing them to the Git
repo. Unnecessary ones can be [cleared].

To track successive steps in a longer or deeper <abbr>experiment</abbr>, you can
register checkpoints from your code. Each `dvc exp run` will resume from the
last checkpoint.
It's possible to schedule experiments for later execution with
`dvc exp run --queue`. To actually run them, use `dvc exp run --run-all`. This
can execute them one by one (default) or in parallel (using the `--jobs`
option).

First, mark at least stage <abbr>output</abbr> with `checkpoint: true` in
`dvc.yaml`. This is needed so that the experiment can resume later, based on the
<abbr>cached</abbr> output(s) (circular dependency).
> 📖 Learn more about the [experiments queue].

⚠️ Note that using `checkpoint` in `dvc.yaml` makes it incompatible with
`dvc repro`.

Then, use the `dvc.api.make_checkpoint()` function (Python code), or write a
signal file (any programming language) following the same steps as that
function.

You can now use `dvc exp run` to begin the experiment. All checkpoints
registered at runtime will be preserved, even if the process gets interrupted
(e.g. with `[Ctrl] C`, or by an error). Without interruption, a "wrap-up"
checkpoint will be added (if needed), so that changes to pipeline outputs don't
remain in the workspace.

Subsequent uses of `dvc exp run` will continue from the latest checkpoint (using
the latest cached versions of all outputs).

<details>

### ⚙️ How are checkpoints captured?

Instead of a single commit, checkpoint experiments have multiple commits under
the custom Git reference (in `.git/refs/exps`), similar to a branch.

</details>
It's also possible to run special [checkpoint experiments] for deep learning ML.

List previous checkpoints with `dvc exp show`. To resume from a previous
checkpoint, you must first `dvc exp apply` it before using `dvc exp run`. For
`--queue` or `--temp` runs (see next section), use `--rev` instead to specify
the checkpoint to continue from.
> 📖 See [Running checkpoint experiments].

Alternatively, use `--reset` to start over (discards previous checkpoints and
their outputs). This is useful for re-training ML models, for example.

## Queueing and parallel execution

The `--queue` option lets you create an experiment as usual, except that nothing
is actually run. Instead, the experiment is put in a wait-list for later
execution. `dvc exp show` will mark queued experiments with an asterisk `*`.

> Note that queuing an experiment that uses checkpoints implies `--reset`,
> unless a `--rev` is provided (refer to the previous section).
Comment on lines -114 to -115
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: I removed this note from the ref. and it's not mentioned in any guide (it's still noted in the Options section for now). Should it be mentioned in a guide? Where? How?


Use `dvc exp run --run-all` to process the queue. This is done outside your
<abbr>workspace</abbr> (in temporary dirs in `.dvc/tmp/exps`) to preserve any
changes between/after queueing runs.

💡 You can also run a single experiment outside the workspace with
`dvc exp run --temp`, for example to continue working on the project meanwhile
(e.g. on another terminal).

> ⚠️ Note that only tracked files and directories will be included in
> `--queue/temp` experiments. To include untracked files, stage them with
> `git add` first (before `dvc exp run`). Feel free to `git reset` them
> afterwards. Git-ignored files/dirs are explicitly excluded from runs outside
> the workspace to avoid committing unwanted files into experiments.

<details>

### ⚙️ How are experiments queued?

A custom [Git stash](https://www.git-scm.com/docs/git-stash) is used to queue
pre-experiment commits.

</details>

Adding `-j` (`--jobs`), experiment queues can be run in parallel for better
performance (creates a tmp dir for each job).

⚠️ Parallel runs are experimental and may be unstable at this time. ⚠️ Make sure
you're using a number of jobs that your environment can handle (no more than the
CPU cores).

> Note that each job runs the entire pipeline (or `targets`) serially. DVC makes
> no attempt to distribute stage commands among jobs. The order in which they
> were queued is also not preserved when running them.
[running experiments]: /doc/user-guide/experiment-management/running-experiments
[made persistent]: /doc/user-guide/experiment-management/persisting-experiments
[cleared]: /doc/user-guide/experiment-management/cleaning-experiments
[checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints
[running checkpoint experiments]:
/doc/user-guide/experiment-management/running-experiments#checkpoint-experiments
[experiments queue]:
/doc/user-guide/experiment-management/running-experiments#the-experiments-queue

## Options

> In addition to the following, `dvc exp run` accepts all the options in
> `dvc repro`, with the exception that `--no-commit` has no effect here.
> `dvc repro`, with the exception that `--no-commit` has no effect.

- `-S [<filename>:]<param_name>=<param_value>`,
`--set-param [<filename>:]<param_name>=<param_value>` - set the value of
Expand All @@ -169,8 +77,10 @@ CPU cores).

- `--queue` - place this experiment at the end of a line for future execution,
but don't actually run it yet. Use `dvc exp run --run-all` to process the
queue. For checkpoint experiments, this implies `--reset` unless a `--rev` is
provided.
queue.

> For checkpoint experiments, this implies `--reset` unless a `--rev` is
> provided.

- `--run-all` - run all queued experiments (see `--queue`) and outside your
workspace (in `.dvc/tmp/exps`). Use `-j` to execute them
Expand All @@ -183,7 +93,7 @@ CPU cores).
- `-r <commit>`, `--rev <commit>` - continue an experiment from a specific
checkpoint name or hash (`commit`) in `--queue` or `--temp` runs.

- `--reset` - deletes `checkpoint` outputs before running this experiment
- `--reset` - deletes `checkpoint: true` outputs before running this experiment
(regardless of `dvc.lock`). Useful for ML model re-training.

- `-f`, `--force` - reproduce pipelines even if no changes were found (same as
Expand All @@ -200,8 +110,8 @@ CPU cores).

## Examples

> These examples are based on our [Get Started](/doc/start/experiments), where
> you can find the actual source code.
> This is based on our [Get Started](/doc/start/experiments), where you can find
> the actual source code.

<details>

Expand Down Expand Up @@ -256,19 +166,16 @@ experiment we just ran (`exp-44136`).

## Example: Modify parameters on-the-fly

You could modify a params file just like any other <abbr>dependency</abbr> and
run an experiment on that basis. Since this is a common need, `dvc exp run`
comes with the `--set-param` (`-S`) option built-in to update existing
parameters. This saves you the need to manually edit the params file.
`dvc exp run--set-param` (`-S`) saves you the need to manually edit the params
file before running an experiment.

```dvc
$ dvc exp run -S prepare.split=0.25 -S featurize.max_features=2000
...
Reproduced experiment(s): exp-18bf6
Experiment results have been applied to your workspace.
```

To see the results, we can use `dvc exp diff` which compares both params and
To see the results, we can use `dvc exp diff`, which compares both params and
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
metrics to the previous project version:

```dvc
Expand Down
78 changes: 45 additions & 33 deletions content/docs/user-guide/experiment-management/checkpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,35 @@

_New in DVC 2.0_

To track successive steps in a longer experiment, you can register checkpoints
from your code at runtime. This is especially helpful in machine learning, for
example to track the progress in deep learning techniques such as evolving
neural networks.

_Checkpoint experiments_ track a series of variations (the checkpoints) and
their execution can be stopped and resumed as needed. You interact with them
using the `--rev` and `--reset` options of `dvc exp run` (see also the
`checkpoint` field in `dvc.yaml` `outs`). They can help you
To track successive steps in a longer machine learning experiment, you can
register checkpoints from your code at runtime, for example to track the
progress with deep learning techniques. They can help you
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- implement the best practice in deep learning to save your model weights as
checkpoints.
- track all code and data changes corresponding to the checkpoints.
- see when metrics start diverging and revert to the optimal checkpoint.
- automate the process of tracking every training epoch.

> Experiments and checkpoints are [implemented](/blog/experiment-refs) with
> hidden Git experiment commits branches.
Checkpoint [execution] can be stopped and resumed as needed. You interact with
them using the `--rev` and `--reset` options of `dvc exp run` (see also the
`checkpoint` field in `dvc.yaml` `outs`).

[execution]:
/doc/user-guide/experiment-management/running-experiments#checkpoint-experiments

<details>

### ⚙️ How are checkpoints captured?

Instead of a single reference like [regular experiments], checkpoint experiments
have multiple commits under the custom Git reference (in `.git/refs/exps`),
similar to a branch.

[regular experiments]:
/doc/user-guide/experiment-management/experiments-overview

</details>

Like with regular experiments, checkpoints can become persistent by
[committing them to Git](#committing-checkpoints-to-git).
Expand Down Expand Up @@ -62,38 +73,36 @@ running:
$ pip install -r requirements.txt
```

This will download all of the packages you need to run the example. Now you have
everything you need to get started with experiments and checkpoints.
This will download all of the packages you need to run the example.

To initialize this project as a <abbr>DVC repository</abbr>, use `dvc init`. Now
you have everything you need to get started with experiments and checkpoints.

</details>

## Setting up a DVC pipeline

DVC versions data and it also can version the ML model weights file as
checkpoints during the training process. To enable this, you will need to set up
a DVC pipeline to train your model.

Adding a DVC pipeline only takes a few commands. At the root of the project,
run:

```dvc
$ dvc init
```
DVC can version data as well as the ML model weights file in checkpoints during
the training process. To enable this, you will need to set up a
[DVC pipeline](/doc/start/data-pipelines) to train your model.

This sets up the files you need for your DVC pipeline to work.

Now we need to add a stage for training our model within a DVC pipeline. We'll
do that with `dvc stage add`, which we'll explain more later. For now, run the
following command:
Now we need to add a training stage to `dvc.yaml` including `checkpoint: true`
in its <abbr>output</abbr>. This tells DVC which <abbr>cached</abbr> output(s)
to use to resume the experiment later (a circular dependency). We'll do this
with `dvc stage add`.

```dvc
$ dvc stage add --name train --deps data/MNIST --deps train.py \
--checkpoints model.pt --plots-no-cache predictions.json \
--params seed,lr,weight_decay --live dvclive python train.py
$ dvc stage add --name train \
--deps data/MNIST --deps train.py \
--params seed,lr,weight_decay \
--checkpoints model.pt \
--plots-no-cache predictions.json \
--live dvclive \
python train.py
```

The `--live dvclive` option enables our special logger [DVCLive](/doc/dvclive),
which helps you register checkpoints from your code.
💡 The `--live dvclive` option enables our special logger
[DVCLive](/doc/dvclive), which helps you register checkpoints from code.

The checkpoints need to be enabled in DVC at the pipeline level. The
`-c / --checkpoint` option of the `dvc stage add` command defines the checkpoint
Expand Down Expand Up @@ -132,6 +141,9 @@ stages:
html: true
```

⚠️ Note that enabling checkpoints in a `dvc.yaml` file makes it incompatible
with `dvc repro`.

Before we go any further, this is a great point to add these changes to your Git
history. You can do that with the following commands:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Although DVC uses minimal resources to keep track of the experiments, they may
clutter tables and the workspace. DVC allows to remove specific experiments from
the workspace or delete all not-yet-[persisted] experiments at once.
the workspace or delete the ones that are not [final] yet.

[persisted]: /doc/user-guide/experiment-management/persisting-experiments
[final]: /doc/user-guide/experiment-management/persisting-experiments

## Removing specific experiments

Expand All @@ -30,10 +30,13 @@ these to keep rather than which of these to remove. You can use `dvc exp gc` to
select a set of experiments to keep and the rest of them are _garbage
collected._

This command takes a _scope_ argument. The scope can be `workspace`,
`all-branches`, `all-tags`, `all-commits`. In garbage collection, the scope
determines the experiments to _keep_, i.e., experiments out of the scope of the
given flag are removed.
This command takes a `scope` argument. It accepts "workspace", "all-branches",
"all-tags", or "all-commits". This determines the experiments to _keep_, i.e.
experiments not in scope are removed.

> ⚠️ Note that experiment remains in the <abbr>cache</abbr> until you use
> regular `dvc gc` separately to clean it up (if it's not needed by committed
> versions).
Comment on lines +37 to +39
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also extracted from exp run ref.


### Keeping experiments in the workspace

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,11 @@ $ dvc exp show
`dvc exp show` only tabulates experiments in the workspace and in `HEAD`. You
can use `--all` flag to show all the experiments in the project instead.

Note that [queued experiments] will be marked with an asterisk `*`.

[queued experiments]:
Comment on lines +112 to +114
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted from exp run ref.

/doc/user-guide/experiment-management/running-experiments#the-experiments-queue

## Customize the table of experiments

The table output may become cluttered if you have a large number of parameters
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ experiments. This includes the locations for expected <abbr>dependencies</abbr>
<abbr>metrics</abbr>, etc.). These assume [sane defaults] but can be customized
with the options of `dvc exp init`.

💡 We recommend adding the `-i` flag to use its `--interactive` mode. This will
💡 We recommend adding the `-i` flag to use its [interactive mode]. This will
ask you how to run the experiments, and guide you through customizing the
aforementioned locations (optional).

Expand All @@ -70,3 +70,4 @@ begin using DVC Experiments. Now you can move on to [running experiments][run]
(next).

[sane defaults]: /doc/command-reference/exp/init#description
[interactive mode]: /doc/command-reference/exp/init#example-interactive-mode
Loading