-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: extract technical details from exp run
to guides
#3182
Changes from all commits
7cc3bc2
30e9574
ea1543b
d8ba1f5
c9ab4b0
0d014b1
8341712
fe46589
deb7723
f8113d2
d499eb9
4c1b639
4aae871
122e08f
379da9a
2e0613a
5d4551e
a4691a6
084a6ed
5e5dfd3
3239633
df74ce9
64eaf8f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# exp run | ||
|
||
Run or resume an [experiment](/doc/command-reference/exp). | ||
Run or resume a | ||
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview). | ||
|
||
## Synopsis | ||
|
||
|
@@ -22,136 +23,46 @@ Provides a way to execute and track <abbr>experiments</abbr> in your | |
<abbr>project</abbr> without polluting it with unnecessary commits, branches, | ||
directories, etc. | ||
|
||
> `dvc exp run` is equivalent to `dvc repro` for experiments. It has the same | ||
> behavior when it comes to `targets` and stage execution (restores the | ||
> dependency graph, etc.). See the command [options](#options) for more on the | ||
> differences. | ||
> `dvc exp run` has the same behavior as `dvc repro` when it comes to `targets` | ||
> and stage execution (restores the dependency graph, etc.). See the command | ||
> [options](#options) for more on the differences. | ||
|
||
Before running an experiment, you'll probably want to make modifications such as | ||
data and code updates, or <abbr>hyperparameter</abbr> tuning. For the latter, | ||
you can use the `--set-param` (`-S`) option of this command to change | ||
`dvc param` values on-the fly. | ||
Use the `--set-param` (`-S`) option as a shortcut to change | ||
<abbr>parameter</abbr> values [on-the-fly] before running the experiment. | ||
|
||
Each experiment creates and tracks a project variation based on your | ||
<abbr>workspace</abbr> changes. Experiments will have a unique, auto-generated | ||
name like `exp-bfe64` by default, which can be customized using the `--name` | ||
(`-n`) option. | ||
It's possible to [queue experiments] for later execution with the `--queue` | ||
flag. To actually run them, use `dvc exp run --run-all`. Queued experiments are | ||
run sequentially by default, but can be run in parallel using the `--jobs` | ||
option. | ||
|
||
<details> | ||
|
||
### ⚙️ How does DVC track experiments? | ||
|
||
Experiments are custom | ||
[Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References) | ||
(found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked | ||
out by DVC). Note that these commits are not pushed to Git remotes by default | ||
(see `dvc exp push`). | ||
|
||
</details> | ||
|
||
The results of the last `dvc exp run` can be seen in the workspace. To display | ||
and compare multiple experiments, use `dvc exp show` or `dvc exp diff` | ||
(`plots diff` also accepts experiment names as `revisions`). Use `dvc exp apply` | ||
to restore the results of any other experiment instead. | ||
|
||
Successful experiments can be made | ||
[persistent](/doc/user-guide/experiment-management#persistent-experiments) by | ||
committing them to the Git repo. Unnecessary ones can be removed with | ||
`dvc exp remove`or `dvc exp gc` (or abandoned). | ||
|
||
> Note that experiment data will remain in the <abbr>cache</abbr> until you use | ||
> regular `dvc gc` to clean it up. | ||
|
||
## Checkpoints | ||
|
||
To track successive steps in a longer or deeper <abbr>experiment</abbr>, you can | ||
register checkpoints from your code. Each `dvc exp run` will resume from the | ||
last checkpoint. | ||
|
||
First, mark at least stage <abbr>output</abbr> with `checkpoint: true` in | ||
`dvc.yaml`. This is needed so that the experiment can resume later, based on the | ||
<abbr>cached</abbr> output(s) (circular dependency). | ||
|
||
⚠️ Note that using `checkpoint` in `dvc.yaml` makes it incompatible with | ||
`dvc repro`. | ||
|
||
Then, use the `dvc.api.make_checkpoint()` function (Python code), or write a | ||
signal file (any programming language) following the same steps as that | ||
function. | ||
|
||
You can now use `dvc exp run` to begin the experiment. All checkpoints | ||
registered at runtime will be preserved, even if the process gets interrupted | ||
(e.g. with `[Ctrl] C`, or by an error). Without interruption, a "wrap-up" | ||
checkpoint will be added (if needed), so that changes to pipeline outputs don't | ||
remain in the workspace. | ||
|
||
Subsequent uses of `dvc exp run` will continue from the latest checkpoint (using | ||
the latest cached versions of all outputs). | ||
|
||
<details> | ||
|
||
### ⚙️ How are checkpoints captured? | ||
|
||
Instead of a single commit, checkpoint experiments have multiple commits under | ||
the custom Git reference (in `.git/refs/exps`), similar to a branch. | ||
> ⚠️ Parallel runs are experimental and may be unstable. Make sure you're using | ||
> a number of jobs that your environment can handle (no more than the CPU | ||
> cores). | ||
|
||
</details> | ||
|
||
List previous checkpoints with `dvc exp show`. To resume from a previous | ||
checkpoint, you must first `dvc exp apply` it before using `dvc exp run`. For | ||
`--queue` or `--temp` runs (see next section), use `--rev` instead to specify | ||
the checkpoint to continue from. | ||
|
||
Alternatively, use `--reset` to start over (discards previous checkpoints and | ||
their outputs). This is useful for re-training ML models, for example. | ||
|
||
## Queueing and parallel execution | ||
|
||
The `--queue` option lets you create an experiment as usual, except that nothing | ||
is actually run. Instead, the experiment is put in a wait-list for later | ||
execution. `dvc exp show` will mark queued experiments with an asterisk `*`. | ||
It's also possible to run special [checkpoint experiments] that log the | ||
execution progress (useful for deep learning ML). The `--rev` and `--reset` | ||
options have special uses for these. | ||
|
||
> Note that queuing an experiment that uses checkpoints implies `--reset`, | ||
> unless a `--rev` is provided (refer to the previous section). | ||
Comment on lines
-114
to
-115
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: I removed this note from the ref. and it's not mentioned in any guide (it's still noted in the Options section for now). Should it be mentioned in a guide? Where? How? |
||
|
||
Use `dvc exp run --run-all` to process the queue. This is done outside your | ||
<abbr>workspace</abbr> (in temporary dirs in `.dvc/tmp/exps`) to preserve any | ||
changes between/after queueing runs. | ||
|
||
💡 You can also run a single experiment outside the workspace with | ||
`dvc exp run --temp`, for example to continue working on the project meanwhile | ||
(e.g. on another terminal). | ||
|
||
> ⚠️ Note that only tracked files and directories will be included in | ||
> `--queue/temp` experiments. To include untracked files, stage them with | ||
> `git add` first (before `dvc exp run`). Feel free to `git reset` them | ||
> afterwards. Git-ignored files/dirs are explicitly excluded from runs outside | ||
> the workspace to avoid committing unwanted files into experiments. | ||
|
||
<details> | ||
> 📖 See the [Running Experiments] guide for more details on all these features. | ||
|
||
### ⚙️ How are experiments queued? | ||
[Review] run experiments with `dvc exp show`. Successful ones can be [made | ||
persistent] by restoring them via `dvc exp branch` or `dvc exp apply` and | ||
committing them to the Git repo. Unnecessary ones can be [cleared] with | ||
`dvc exp gc`. | ||
|
||
A custom [Git stash](https://www.git-scm.com/docs/git-stash) is used to queue | ||
pre-experiment commits. | ||
|
||
</details> | ||
|
||
Adding `-j` (`--jobs`), experiment queues can be run in parallel for better | ||
performance (creates a tmp dir for each job). | ||
|
||
⚠️ Parallel runs are experimental and may be unstable at this time. ⚠️ Make sure | ||
you're using a number of jobs that your environment can handle (no more than the | ||
CPU cores). | ||
|
||
> Note that each job runs the entire pipeline (or `targets`) serially. DVC makes | ||
> no attempt to distribute stage commands among jobs. The order in which they | ||
> were queued is also not preserved when running them. | ||
[on-the-fly]: #example-modify-parameters-on-the-fly | ||
[queue experiments]: | ||
/doc/user-guide/experiment-management/running-experiments#the-experiments-queue | ||
[checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints | ||
[running experiments]: /doc/user-guide/experiment-management/running-experiments | ||
[review]: /doc/user-guide/experiment-management/comparing-experiments | ||
[made persistent]: /doc/user-guide/experiment-management/persisting-experiments | ||
[cleared]: /doc/user-guide/experiment-management/cleaning-experiments | ||
|
||
## Options | ||
|
||
> In addition to the following, `dvc exp run` accepts all the options in | ||
> `dvc repro`, with the exception that `--no-commit` has no effect here. | ||
> `dvc repro`, with the exception that `--no-commit` has no effect. | ||
|
||
- `-S [<filename>:]<param_name>=<param_value>`, | ||
`--set-param [<filename>:]<param_name>=<param_value>` - set the value of | ||
|
@@ -169,8 +80,10 @@ CPU cores). | |
|
||
- `--queue` - place this experiment at the end of a line for future execution, | ||
but don't actually run it yet. Use `dvc exp run --run-all` to process the | ||
queue. For checkpoint experiments, this implies `--reset` unless a `--rev` is | ||
provided. | ||
queue. | ||
|
||
> For checkpoint experiments, this implies `--reset` unless a `--rev` is | ||
> provided. | ||
|
||
- `--run-all` - run all queued experiments (see `--queue`) and outside your | ||
workspace (in `.dvc/tmp/exps`). Use `-j` to execute them | ||
|
@@ -180,10 +93,14 @@ CPU cores). | |
parallel. Only has an effect along with `--run-all`. Defaults to 1 (the queue | ||
is processed serially). | ||
|
||
> Note that since queued experiments are run isolated from each other, common | ||
> stages may sometimes be executed several times depending on the state of the | ||
> [run-cache] at that time. | ||
|
||
- `-r <commit>`, `--rev <commit>` - continue an experiment from a specific | ||
checkpoint name or hash (`commit`) in `--queue` or `--temp` runs. | ||
|
||
- `--reset` - deletes `checkpoint` outputs before running this experiment | ||
- `--reset` - deletes `checkpoint: true` outputs before running this experiment | ||
(regardless of `dvc.lock`). Useful for ML model re-training. | ||
|
||
- `-f`, `--force` - reproduce pipelines even if no changes were found (same as | ||
|
@@ -198,10 +115,12 @@ CPU cores). | |
|
||
- `-v`, `--verbose` - displays detailed tracing information. | ||
|
||
[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache | ||
|
||
## Examples | ||
|
||
> These examples are based on our [Get Started](/doc/start/experiments), where | ||
> you can find the actual source code. | ||
> This is based on our [Get Started](/doc/start/experiments), where you can find | ||
> the actual source code. | ||
|
||
<details> | ||
|
||
|
@@ -256,19 +175,16 @@ experiment we just ran (`exp-44136`). | |
|
||
## Example: Modify parameters on-the-fly | ||
|
||
You could modify a params file just like any other <abbr>dependency</abbr> and | ||
run an experiment on that basis. Since this is a common need, `dvc exp run` | ||
comes with the `--set-param` (`-S`) option built-in to update existing | ||
parameters. This saves you the need to manually edit the params file. | ||
`dvc exp run--set-param` (`-S`) saves you the need to manually edit the params | ||
file before running an experiment. | ||
|
||
```dvc | ||
$ dvc exp run -S prepare.split=0.25 -S featurize.max_features=2000 | ||
... | ||
Reproduced experiment(s): exp-18bf6 | ||
Experiment results have been applied to your workspace. | ||
``` | ||
|
||
To see the results, we can use `dvc exp diff` which compares both params and | ||
To see the results, you can use `dvc exp diff`. It compares both params and | ||
metrics to the previous project version: | ||
|
||
```dvc | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,9 @@ | |
|
||
Although DVC uses minimal resources to keep track of the experiments, they may | ||
clutter tables and the workspace. DVC allows to remove specific experiments from | ||
the workspace or delete all not-yet-[persisted] experiments at once. | ||
the workspace or delete the ones that are not [final] yet. | ||
|
||
[persisted]: /doc/user-guide/experiment-management/persisting-experiments | ||
[final]: /doc/user-guide/experiment-management/persisting-experiments | ||
|
||
## Removing specific experiments | ||
|
||
|
@@ -30,10 +30,13 @@ these to keep rather than which of these to remove. You can use `dvc exp gc` to | |
select a set of experiments to keep and the rest of them are _garbage | ||
collected._ | ||
|
||
This command takes a _scope_ argument. The scope can be `workspace`, | ||
`all-branches`, `all-tags`, `all-commits`. In garbage collection, the scope | ||
determines the experiments to _keep_, i.e., experiments out of the scope of the | ||
given flag are removed. | ||
This command takes a `scope` argument. It accepts "workspace", "all-branches", | ||
"all-tags", or "all-commits". This determines the experiments to _keep_, i.e. | ||
experiments not in scope are removed. | ||
|
||
> ⚠️ Note that experiment remains in the <abbr>cache</abbr> until you use | ||
> regular `dvc gc` separately to clean it up (if it's not needed by committed | ||
> versions). | ||
Comment on lines
+37
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also extracted from |
||
|
||
### Keeping experiments in the workspace | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -109,6 +109,11 @@ $ dvc exp show | |
`dvc exp show` only tabulates experiments in the workspace and in `HEAD`. You | ||
can use `--all` flag to show all the experiments in the project instead. | ||
|
||
Note that [queued experiments] will be marked with an asterisk `*`. | ||
|
||
[queued experiments]: | ||
Comment on lines
+112
to
+114
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extracted from |
||
/doc/user-guide/experiment-management/running-experiments#the-experiments-queue | ||
|
||
## Customize the table of experiments | ||
|
||
The table output may become cluttered if you have a large number of parameters | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots removed from this ref. (either moved to a guide, or it was already there), and some paragraphs got moved around. May be easier to review by seeing the resulting https://dvc-org-ref-exp-pmsxtvqwhk3xsn.herokuapp.com/doc/command-reference/exp/run .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE: Per #3182 (comment) I reintroduced text that describes all major features of
exp run
to this ref.