From 7cc3bc22269f456824373fb5cab585121f4e08c6 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 17:16:50 -0600 Subject: [PATCH 01/21] guide: link from exp init intro to -i example in ref. --- .../user-guide/experiment-management/experiments-overview.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/experiments-overview.md b/content/docs/user-guide/experiment-management/experiments-overview.md index 82251c59e0..0eff5fbe53 100644 --- a/content/docs/user-guide/experiment-management/experiments-overview.md +++ b/content/docs/user-guide/experiment-management/experiments-overview.md @@ -61,7 +61,7 @@ experiments. This includes the locations for expected dependencies metrics, etc.). These assume [sane defaults] but can be customized with the options of `dvc exp init`. -💡 We recommend adding the `-i` flag to use its `--interactive` mode. This will +💡 We recommend adding the `-i` flag to use its [interactive mode]. This will ask you how to run the experiments, and guide you through customizing the aforementioned locations (optional). @@ -70,3 +70,4 @@ begin using DVC Experiments. Now you can move on to [running experiments][run] (next). [sane defaults]: /doc/command-reference/exp/init#description +[interactive mode]: /doc/command-reference/exp/init#example-interactive-mode From 30e9574d3206dad4218588204355771c46da8d7c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 17:32:48 -0600 Subject: [PATCH 02/21] ref: simplify note about repro in exp run --- content/docs/command-reference/exp/run.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 4dd24b6896..a1ef155bf1 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -22,10 +22,7 @@ Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc. -> `dvc exp run` is equivalent to `dvc repro` for experiments. It has the same -> behavior when it comes to `targets` and stage execution (restores the -> dependency graph, etc.). See the command [options](#options) for more on the -> differences. +> 📖 See full [Running Experiments] guide. Before running an experiment, you'll probably want to make modifications such as data and code updates, or hyperparameter tuning. For the latter, @@ -37,6 +34,12 @@ Each experiment creates and tracks a project variation based on your name like `exp-bfe64` by default, which can be customized using the `--name` (`-n`) option. +> `dvc exp run` has the same behavior as `dvc repro` when it comes to `targets` +> and stage execution (restores the dependency graph, etc.). +> See the command [options](#options) for more on the differences. + +[running experiments]: /doc/user-guide/experiment-management/running-experiments +
### ⚙ī¸ How does DVC track experiments? @@ -151,7 +154,7 @@ CPU cores). ## Options > In addition to the following, `dvc exp run` accepts all the options in -> `dvc repro`, with the exception that `--no-commit` has no effect here. +> `dvc repro`, with the exception that `--no-commit` has no effect. - `-S [:]=`, `--set-param [:]=` - set the value of From ea1543be7911a37511bf0239069e503a239ea281 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 17:33:55 -0600 Subject: [PATCH 03/21] ref: remove details about dvc exp codification --- content/docs/command-reference/exp/run.md | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index a1ef155bf1..97c358dc90 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -1,6 +1,7 @@ # exp run -Run or resume an [experiment](/doc/command-reference/exp). +Run or resume a +[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview). ## Synopsis @@ -40,18 +41,6 @@ name like `exp-bfe64` by default, which can be customized using the `--name` [running experiments]: /doc/user-guide/experiment-management/running-experiments -
- -### ⚙ī¸ How does DVC track experiments? - -Experiments are custom -[Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References) -(found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked -out by DVC). Note that these commits are not pushed to Git remotes by default -(see `dvc exp push`). - -
- The results of the last `dvc exp run` can be seen in the workspace. To display and compare multiple experiments, use `dvc exp show` or `dvc exp diff` (`plots diff` also accepts experiment names as `revisions`). Use `dvc exp apply` From d8ba1f5c46a6e9ad249fbac507b9c34ff4277ae6 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 17:39:49 -0600 Subject: [PATCH 04/21] ref: remove motivation for using exp run Alrd in Running Exps --- content/docs/command-reference/exp/run.md | 5 ++--- .../user-guide/experiment-management/running-experiments.md | 2 +- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 97c358dc90..a39404ce4e 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -26,9 +26,8 @@ directories, etc. > 📖 See full [Running Experiments] guide. Before running an experiment, you'll probably want to make modifications such as -data and code updates, or hyperparameter tuning. For the latter, -you can use the `--set-param` (`-S`) option of this command to change -`dvc param` values on-the fly. +parameter tuning. You can use the `--set-param` (`-S`) option to +change param values on-the fly. Each experiment creates and tracks a project variation based on your workspace changes. Experiments will have a unique, auto-generated diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 0edca46ab0..313f5bd734 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -99,7 +99,7 @@ Going to reproduce stage: 'train'... continue? [y/n] ## (Hyper)parameters -Parameters are the values that modify the underlying code's +ML parameters are the values that modify the underlying code's behavior, producing different experiment results. Machine learning experimentation, for example, involves searching hyperparameters that improve the resulting model metrics. From c9ab4b01a70efbc4520ba2690ed4e8549138f22c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 17:43:21 -0600 Subject: [PATCH 05/21] ref: remove emphasis on exp run --name Alrd (hidden) in Exps Overview --- content/docs/command-reference/exp/run.md | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index a39404ce4e..ae0727dd3f 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -23,22 +23,15 @@ Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc. -> 📖 See full [Running Experiments] guide. +> 📖 See full [Running Experiments] guide for more information. Before running an experiment, you'll probably want to make modifications such as parameter tuning. You can use the `--set-param` (`-S`) option to change param values on-the fly. -Each experiment creates and tracks a project variation based on your -workspace changes. Experiments will have a unique, auto-generated -name like `exp-bfe64` by default, which can be customized using the `--name` -(`-n`) option. - > `dvc exp run` has the same behavior as `dvc repro` when it comes to `targets` -> and stage execution (restores the dependency graph, etc.). -> See the command [options](#options) for more on the differences. - -[running experiments]: /doc/user-guide/experiment-management/running-experiments +> and stage execution (restores the dependency graph, etc.). See the command +> [options](#options) for more on the differences. The results of the last `dvc exp run` can be seen in the workspace. To display and compare multiple experiments, use `dvc exp show` or `dvc exp diff` @@ -53,6 +46,8 @@ committing them to the Git repo. Unnecessary ones can be removed with > Note that experiment data will remain in the cache until you use > regular `dvc gc` to clean it up. +[running experiments]: /doc/user-guide/experiment-management/running-experiments + ## Checkpoints To track successive steps in a longer or deeper experiment, you can From 0d014b19c071089eb3089f10d0ea13bcfbf0baac Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 17:51:52 -0600 Subject: [PATCH 06/21] guide: bring exp show/diff/apply links from exp run ref into Running Exps --- content/docs/command-reference/exp/run.md | 5 ----- .../running-experiments.md | 20 ++++++++++++++++--- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index ae0727dd3f..339b93730d 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -33,11 +33,6 @@ change param values on-the fly. > and stage execution (restores the dependency graph, etc.). See the command > [options](#options) for more on the differences. -The results of the last `dvc exp run` can be seen in the workspace. To display -and compare multiple experiments, use `dvc exp show` or `dvc exp diff` -(`plots diff` also accepts experiment names as `revisions`). Use `dvc exp apply` -to restore the results of any other experiment instead. - Successful experiments can be made [persistent](/doc/user-guide/experiment-management#persistent-experiments) by committing them to the Git repo. Unnecessary ones can be removed with diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 313f5bd734..ea57473718 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -15,9 +15,7 @@ DVC relies on pipelines that codify experiment workflows (code, `dvc.yaml` file. These contain the commands to run the experiments. > 📖 See [Get Started: Data Pipelines](/doc/start/data-pipelines) for an intro -> to this topic. -> Here we assume that there's already a working `dvc.yaml` file in the -> project. +> to this topic. [ug-pipeline-files]: /doc/user-guide/project-structure/pipelines-files @@ -97,6 +95,22 @@ Going to reproduce stage: 'train'... continue? [y/n] > Note that `dvc exp run` is an experimentation-specific alternative to > `dvc repro`. +### Working with the results + +The results of the last `dvc exp run` can be seen in the workspace +and are stored and tracked internally by DVC. + +To display and compare multiple experiments, use `dvc exp show` or +`dvc exp diff`. `plots diff` also accepts experiments as `revisions`. See +[Reviewing and Comparing Experiments][reviewing] for more details. + +Use `dvc exp apply` to restore the results of any other experiment instead. See +[Bring experiment results to your workspace][apply] for more. + +[reviewing]: /doc/user-guide/experiment-management/comparing-experiments +[apply]: + /doc/user-guide/experiment-management/persisting-experiments#bring-experiment-results-to-your-workspace + ## (Hyper)parameters ML parameters are the values that modify the underlying code's From 8341712838ccd5aa57ebd09aa9997b979fb153aa Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 18:01:16 -0600 Subject: [PATCH 07/21] guide: bring details about clearing exps from exp run into Cleaning Up Exps --- content/docs/command-reference/exp/run.md | 11 ++++------- .../experiment-management/cleaning-experiments.md | 15 +++++++++------ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 339b93730d..826c84d89c 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -33,15 +33,12 @@ change param values on-the fly. > and stage execution (restores the dependency graph, etc.). See the command > [options](#options) for more on the differences. -Successful experiments can be made -[persistent](/doc/user-guide/experiment-management#persistent-experiments) by -committing them to the Git repo. Unnecessary ones can be removed with -`dvc exp remove`or `dvc exp gc` (or abandoned). - -> Note that experiment data will remain in the cache until you use -> regular `dvc gc` to clean it up. +Successful experiments can be [made persistent] by committing them to the Git +repo. Unnecessary ones can be [cleared]. [running experiments]: /doc/user-guide/experiment-management/running-experiments +[persistent]: /doc/user-guide/experiment-management/persisting-experiments +[cleared]: /doc/user-guide/experiment-management/cleaning-experiments ## Checkpoints diff --git a/content/docs/user-guide/experiment-management/cleaning-experiments.md b/content/docs/user-guide/experiment-management/cleaning-experiments.md index 8a54ee91ec..5064ec7bee 100644 --- a/content/docs/user-guide/experiment-management/cleaning-experiments.md +++ b/content/docs/user-guide/experiment-management/cleaning-experiments.md @@ -2,9 +2,9 @@ Although DVC uses minimal resources to keep track of the experiments, they may clutter tables and the workspace. DVC allows to remove specific experiments from -the workspace or delete all not-yet-[persisted] experiments at once. +the workspace or delete the ones that are not [final] yet. -[persisted]: /doc/user-guide/experiment-management/persisting-experiments +[final]: /doc/user-guide/experiment-management/persisting-experiments ## Removing specific experiments @@ -30,10 +30,13 @@ these to keep rather than which of these to remove. You can use `dvc exp gc` to select a set of experiments to keep and the rest of them are _garbage collected._ -This command takes a _scope_ argument. The scope can be `workspace`, -`all-branches`, `all-tags`, `all-commits`. In garbage collection, the scope -determines the experiments to _keep_, i.e., experiments out of the scope of the -given flag are removed. +This command takes a `scope` argument. It accepts "workspace", "all-branches", +"all-tags", or "all-commits". This determines the experiments to _keep_, i.e. +experiments not in scope are removed. + +> ⚠ī¸ Note that experiment remains in the cache until you use +> regular `dvc gc` separately to clean it up (if it's not needed by committed +> versions). ### Keeping experiments in the workspace From fe4658901201ebfb505eccc98df845c57d7a80f8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 18:48:15 -0600 Subject: [PATCH 08/21] guide: simplify Checkpoints intro and bring details about checkpoint tracking from exp run ref --- content/docs/command-reference/exp/run.md | 9 ----- .../experiment-management/checkpoints.md | 33 ++++++++++++------- 2 files changed, 22 insertions(+), 20 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 826c84d89c..691281b69e 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -66,15 +66,6 @@ remain in the workspace. Subsequent uses of `dvc exp run` will continue from the latest checkpoint (using the latest cached versions of all outputs). -
- -### ⚙ī¸ How are checkpoints captured? - -Instead of a single commit, checkpoint experiments have multiple commits under -the custom Git reference (in `.git/refs/exps`), similar to a branch. - -
- List previous checkpoints with `dvc exp show`. To resume from a previous checkpoint, you must first `dvc exp apply` it before using `dvc exp run`. For `--queue` or `--temp` runs (see next section), use `--rev` instead to specify diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index dea0641189..a1b4332477 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -2,15 +2,9 @@ _New in DVC 2.0_ -To track successive steps in a longer experiment, you can register checkpoints -from your code at runtime. This is especially helpful in machine learning, for -example to track the progress in deep learning techniques such as evolving -neural networks. - -_Checkpoint experiments_ track a series of variations (the checkpoints) and -their execution can be stopped and resumed as needed. You interact with them -using the `--rev` and `--reset` options of `dvc exp run` (see also the -`checkpoint` field in `dvc.yaml` `outs`). They can help you +To track successive steps in a longer machine learning experiment, you can +register checkpoints from your code at runtime, for example to track the +progress with deep learning techniques. They can help you - implement the best practice in deep learning to save your model weights as checkpoints. @@ -18,8 +12,25 @@ using the `--rev` and `--reset` options of `dvc exp run` (see also the - see when metrics start diverging and revert to the optimal checkpoint. - automate the process of tracking every training epoch. -> Experiments and checkpoints are [implemented](/blog/experiment-refs) with -> hidden Git experiment commits branches. +Checkpoint [execution] can be stopped and resumed as needed. You interact with +them using the `--rev` and `--reset` options of `dvc exp run` (see also the +`checkpoint` field in `dvc.yaml` `outs`). + +[execution]: + /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments + +
+ +### ⚙ī¸ How are checkpoints captured? + +Instead of a single reference like [regular experiments], checkpoint experiments +have multiple commits under the custom Git reference (in `.git/refs/exps`), +similar to a branch. + +[regular experiments]: + /doc/user-guide/experiment-management/experiments-overview + +
Like with regular experiments, checkpoints can become persistent by [committing them to Git](#committing-checkpoints-to-git). From deb7723cd8b24353fb95ce683016275aa2c82560 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 19:28:32 -0600 Subject: [PATCH 09/21] ref: dissolve checkpoints details from exp run into Running Exps and Checkpoints guides --- content/docs/command-reference/exp/run.md | 41 +++-------------- .../experiment-management/checkpoints.md | 45 ++++++++++--------- .../running-experiments.md | 15 ++++--- 3 files changed, 39 insertions(+), 62 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 691281b69e..6d12de7ed8 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -36,43 +36,16 @@ change param values on-the fly. Successful experiments can be [made persistent] by committing them to the Git repo. Unnecessary ones can be [cleared]. +It's also possible to run special [checkpoint experiments] for deep learning ML. + +> 📖 See also [Running checkpoint experiments][run-checkpoints]. + [running experiments]: /doc/user-guide/experiment-management/running-experiments [persistent]: /doc/user-guide/experiment-management/persisting-experiments [cleared]: /doc/user-guide/experiment-management/cleaning-experiments - -## Checkpoints - -To track successive steps in a longer or deeper experiment, you can -register checkpoints from your code. Each `dvc exp run` will resume from the -last checkpoint. - -First, mark at least stage output with `checkpoint: true` in -`dvc.yaml`. This is needed so that the experiment can resume later, based on the -cached output(s) (circular dependency). - -⚠ī¸ Note that using `checkpoint` in `dvc.yaml` makes it incompatible with -`dvc repro`. - -Then, use the `dvc.api.make_checkpoint()` function (Python code), or write a -signal file (any programming language) following the same steps as that -function. - -You can now use `dvc exp run` to begin the experiment. All checkpoints -registered at runtime will be preserved, even if the process gets interrupted -(e.g. with `[Ctrl] C`, or by an error). Without interruption, a "wrap-up" -checkpoint will be added (if needed), so that changes to pipeline outputs don't -remain in the workspace. - -Subsequent uses of `dvc exp run` will continue from the latest checkpoint (using -the latest cached versions of all outputs). - -List previous checkpoints with `dvc exp show`. To resume from a previous -checkpoint, you must first `dvc exp apply` it before using `dvc exp run`. For -`--queue` or `--temp` runs (see next section), use `--rev` instead to specify -the checkpoint to continue from. - -Alternatively, use `--reset` to start over (discards previous checkpoints and -their outputs). This is useful for re-training ML models, for example. +[checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints +[running checkpoint experiments]: + /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments ## Queueing and parallel execution diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index a1b4332477..2c5e433274 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -73,38 +73,36 @@ running: $ pip install -r requirements.txt ``` -This will download all of the packages you need to run the example. Now you have -everything you need to get started with experiments and checkpoints. +This will download all of the packages you need to run the example. + +To initialize this project as a DVC repository, use `dvc init`. Now +you have everything you need to get started with experiments and checkpoints.
## Setting up a DVC pipeline -DVC versions data and it also can version the ML model weights file as -checkpoints during the training process. To enable this, you will need to set up -a DVC pipeline to train your model. +DVC can version data as well as the ML model weights file in checkpoints during +the training process. To enable this, you will need to set up a +[DVC pipeline](/doc/start/data-pipelines) to train your model. -Adding a DVC pipeline only takes a few commands. At the root of the project, -run: +Now we need to add a training stage to `dvc.yaml` including `checkpoint: true` +in its output. This tells DVC which cached output(s) +to use to resume the experiment later (a circular dependency). We'll do this +with `dvc stage add`. ```dvc -$ dvc init +$ dvc stage add --name train \ + --deps data/MNIST --deps train.py \ + --params seed,lr,weight_decay \ + --checkpoints model.pt \ + --plots-no-cache predictions.json \ + --live dvclive \ + python train.py ``` -This sets up the files you need for your DVC pipeline to work. - -Now we need to add a stage for training our model within a DVC pipeline. We'll -do that with `dvc stage add`, which we'll explain more later. For now, run the -following command: - -```dvc -$ dvc stage add --name train --deps data/MNIST --deps train.py \ - --checkpoints model.pt --plots-no-cache predictions.json \ - --params seed,lr,weight_decay --live dvclive python train.py -``` - -The `--live dvclive` option enables our special logger [DVCLive](/doc/dvclive), -which helps you register checkpoints from your code. +💡 The `--live dvclive` option enables our special logger +[DVCLive](/doc/dvclive), which helps you register checkpoints from code. The checkpoints need to be enabled in DVC at the pipeline level. The `-c / --checkpoint` option of the `dvc stage add` command defines the checkpoint @@ -143,6 +141,9 @@ stages: html: true ``` +⚠ī¸ Note that enabling checkpoints in a `dvc.yaml` file makes it incompatible +with `dvc repro`. + Before we go any further, this is a great point to add these changes to your Git history. You can do that with the following commands: diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index ea57473718..fa2064e7b4 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -284,9 +284,15 @@ CPU cores). ## Checkpoint experiments To track successive steps in a longer or deeper experiment, you can -register [checkpoints](/doc/user-guide/experiment-management/checkpoints) from -your code. Running checkpoint experiments is no different than with regular -ones, e.g.: +register "checkpoints" from your code. These combine DVC Experiments with code +logging. The latter can be achieved either with [DVCLive](/doc/dvclive), by +using `dvc.api.make_checkpoint()` (Python code), or writing signal files (any +programming language) following the same steps as `make_checkpoint()`. + +> 📖 See [Checkpoints](/doc/user-guide/experiment-management/checkpoints) to +> learn more about this feature. + +Running checkpoint experiments is no different than with regular ones, e.g.: ```dvc $ dvc exp run -S param=value @@ -308,6 +314,3 @@ their outputs). This is useful for re-training ML models, for example. > Note that queuing an experiment that uses checkpoints implies `--reset`, > unless a `--rev` is provided (refer to the previous section). - -> 📖 See [Checkpoints](/doc/user-guide/experiment-management/checkpoints) to -> learn more about this feature. From f8113d2bcc12d84d71f5abd102d09b1fcc56d077 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 22:21:01 -0600 Subject: [PATCH 10/21] ref: remove exp run --queue details and guide: bring missing info from ref., reorg text --- content/docs/command-reference/exp/run.md | 50 ++++--------- .../comparing-experiments.md | 5 ++ .../running-experiments.md | 70 +++++++++---------- 3 files changed, 54 insertions(+), 71 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 6d12de7ed8..71a76b59a8 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -38,7 +38,16 @@ repo. Unnecessary ones can be [cleared]. It's also possible to run special [checkpoint experiments] for deep learning ML. -> 📖 See also [Running checkpoint experiments][run-checkpoints]. +> 📖 See also [Running checkpoint experiments]. + +It's also possible to use `dvc exp run --queue` to schedule experiments for +later execution. `dvc exp run --run-all` will actually run them later, either +one by one (default) or in parallel (using the `--jobs` option). + +> Note that queuing an experiment that uses checkpoints implies `--reset`, +> unless a `--rev` is provided (refer to the previous section). + +> 📖 Learn more about the [experiments queue]. [running experiments]: /doc/user-guide/experiment-management/running-experiments [persistent]: /doc/user-guide/experiment-management/persisting-experiments @@ -46,41 +55,12 @@ It's also possible to run special [checkpoint experiments] for deep learning ML. [checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints [running checkpoint experiments]: /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments +[experiments queue]: + /doc/user-guide/experiment-management/running-experiments#the-experiments-queue -## Queueing and parallel execution - -The `--queue` option lets you create an experiment as usual, except that nothing -is actually run. Instead, the experiment is put in a wait-list for later -execution. `dvc exp show` will mark queued experiments with an asterisk `*`. - -> Note that queuing an experiment that uses checkpoints implies `--reset`, -> unless a `--rev` is provided (refer to the previous section). - -Use `dvc exp run --run-all` to process the queue. This is done outside your -workspace (in temporary dirs in `.dvc/tmp/exps`) to preserve any -changes between/after queueing runs. - -💡 You can also run a single experiment outside the workspace with -`dvc exp run --temp`, for example to continue working on the project meanwhile -(e.g. on another terminal). - -> ⚠ī¸ Note that only tracked files and directories will be included in -> `--queue/temp` experiments. To include untracked files, stage them with -> `git add` first (before `dvc exp run`). Feel free to `git reset` them -> afterwards. Git-ignored files/dirs are explicitly excluded from runs outside -> the workspace to avoid committing unwanted files into experiments. - -
- -### ⚙ī¸ How are experiments queued? - -A custom [Git stash](https://www.git-scm.com/docs/git-stash) is used to queue -pre-experiment commits. - -
- -Adding `-j` (`--jobs`), experiment queues can be run in parallel for better -performance (creates a tmp dir for each job). +Adding `-j` (`--jobs`), +[experiment queues](/doc/user-guide/experiment-management/running-experiments#the-experiments-queue) +can be run in parallel for better performance (creates a tmp dir for each job). ⚠ī¸ Parallel runs are experimental and may be unstable at this time. ⚠ī¸ Make sure you're using a number of jobs that your environment can handle (no more than the diff --git a/content/docs/user-guide/experiment-management/comparing-experiments.md b/content/docs/user-guide/experiment-management/comparing-experiments.md index 6a274687ca..33ddd29002 100644 --- a/content/docs/user-guide/experiment-management/comparing-experiments.md +++ b/content/docs/user-guide/experiment-management/comparing-experiments.md @@ -109,6 +109,11 @@ $ dvc exp show `dvc exp show` only tabulates experiments in the workspace and in `HEAD`. You can use `--all` flag to show all the experiments in the project instead. +Note that [queued experiments] will be marked with an asterisk `*`. + +[queued experiments]: + /doc/user-guide/experiment-management/running-experiments#the-experiments-queue + ## Customize the table of experiments The table output may become cluttered if you have a large number of parameters diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index fa2064e7b4..811555acba 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -177,8 +177,8 @@ $ dvc exp run --set-param model.learning_rate=0.0002 ... ``` -> Note that parameters are attached to experiments so you can view them together -> with `dvc exp show` and `dvc exp diff`. +> Note that parameters are attached to experiments so they're shown together +> when [reviewing] them (e.g. in `dvc exp show`). To set more than one param for the same experiment, use the `-S` option multiple times: @@ -192,6 +192,17 @@ $ dvc exp run -S learning_rate=0.001 -S units=128 The `--queue` option of `dvc exp run` tells DVC to append an experiment for later execution. Nothing is actually run yet. +```dvc +$ dvc exp run --queue -S units=10 +Queued experiment '1cac8ca' for future execution. +$ dvc exp run --queue -S units=64 +Queued experiment '23660bb' for future execution. +$ dvc exp run --queue -S units=128 +Queued experiment '3591a5c' for future execution. +$ dvc exp run --queue -S units=256 +Queued experiment '4109ead' for future execution. +``` +
### How are experiments queued? @@ -205,32 +216,29 @@ is found in `.git/refs/exps`, and earlier ones are in its [reflog].
+Each experiment is derived from the workspace at the time it's +queued. If you make changes in the workspace afterwards, they won't be reflected +in queued experiments (once run). + +Run them all one-by-one with the `--run-all` flag. For isolation, this is done +outside your workspace (in temporary directories). + +> Note that the order of execution is independent of their creation order. + ```dvc -$ dvc exp run --queue -S units=10 -Queued experiment '1cac8ca' for future execution. -$ dvc exp run --queue -S units=64 -Queued experiment '23660bb' for future execution. -$ dvc exp run --queue -S units=128 -Queued experiment '3591a5c' for future execution. -$ dvc exp run --queue -S units=256 -Queued experiment '4109ead' for future execution. +$ dvc exp run --run-all ``` -Each experiment is derived from the workspace at the time it's queued. If you -make changes in the workspace afterwards, they won't be reflected in queued -experiments (once run). -
-### How are queued experiments isolated? (Temporary directories) +### How are queued experiments isolated? -To guarantee that queued experiments derive from their original workspace, DVC -creates a copy of it in `.dvc/tmp/exps/`, where the experiment will run. All -these workspaces share the main project cache. +DVC creates a copy of the experiment's original workspace in `.dvc/tmp/exps/` +and runs it there. All workspaces share the single project cache, +however. -If you want to isolate an experiments this way without queuing it, you can use -the `--temp` option. This allows you to continue working while a long experiment -runs. +💡 To isolate any experiment (without queuing it), you can use the `--temp` +flag. This allows you to continue working while a long experiment runs, e.g.: ```dvc $ nohup dvc exp run --temp & @@ -238,26 +246,16 @@ $ nohup dvc exp run --temp & nohup: ignoring input and appending output to 'nohup.out' ``` -> The above example creates a `nohup.log` file in the original workspace with -> the output of the DVC process. - -Note that Git-ignored files/dirs are explicitly excluded from queued/temp runs -to avoid committing unwanted files into Git (e.g. once successful experiments -are [persisted]). - -[persisted]: /doc/user-guide/experiment-management/persisting-experiments +Note that Git-ignored files/dirs are excluded from queued/temp runs to avoid +committing unwanted files into Git (e.g. once successful experiments are +[persisted]). > 💡 To include untracked files, stage them with `git add` first (before > `dvc exp run`) and `git reset` them afterwards. -
- -Run them all one-by-one with the `--run-all` flag. The order of execution is -independent of their creation order. +[persisted]: /doc/user-guide/experiment-management/persisting-experiments -```dvc -$ dvc exp run --run-all -``` + To remove all experiments from the queue and start over, you can use `dvc exp remove --queue`. From d499eb92159646b3a58c0d7609c694f95d4e757d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 22:24:31 -0600 Subject: [PATCH 11/21] ref: remove exp run --jobs details (parallel queue exec) Alrd in Running Exps --- content/docs/command-reference/exp/run.md | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 71a76b59a8..b3a94c0186 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -58,18 +58,6 @@ one by one (default) or in parallel (using the `--jobs` option). [experiments queue]: /doc/user-guide/experiment-management/running-experiments#the-experiments-queue -Adding `-j` (`--jobs`), -[experiment queues](/doc/user-guide/experiment-management/running-experiments#the-experiments-queue) -can be run in parallel for better performance (creates a tmp dir for each job). - -⚠ī¸ Parallel runs are experimental and may be unstable at this time. ⚠ī¸ Make sure -you're using a number of jobs that your environment can handle (no more than the -CPU cores). - -> Note that each job runs the entire pipeline (or `targets`) serially. DVC makes -> no attempt to distribute stage commands among jobs. The order in which they -> were queued is also not preserved when running them. - ## Options > In addition to the following, `dvc exp run` accepts all the options in From 4c1b639f1c1b02073d79e1e628245799d1126e1d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 22:33:03 -0600 Subject: [PATCH 12/21] ref: format fixes in exp run --- content/docs/command-reference/exp/run.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index b3a94c0186..884e7189a9 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -36,21 +36,22 @@ change param values on-the fly. Successful experiments can be [made persistent] by committing them to the Git repo. Unnecessary ones can be [cleared]. -It's also possible to run special [checkpoint experiments] for deep learning ML. - -> 📖 See also [Running checkpoint experiments]. - -It's also possible to use `dvc exp run --queue` to schedule experiments for -later execution. `dvc exp run --run-all` will actually run them later, either -one by one (default) or in parallel (using the `--jobs` option). +It's possible to schedule experiments for later execution with +`dvc exp run --queue`. To actually run them, use `dvc exp run --run-all`. This +can execute them one by one (default) or in parallel (using the `--jobs` +option). > Note that queuing an experiment that uses checkpoints implies `--reset`, > unless a `--rev` is provided (refer to the previous section). > 📖 Learn more about the [experiments queue]. +It's also possible to run special [checkpoint experiments] for deep learning ML. + +> 📖 See [Running checkpoint experiments]. + [running experiments]: /doc/user-guide/experiment-management/running-experiments -[persistent]: /doc/user-guide/experiment-management/persisting-experiments +[made persistent]: /doc/user-guide/experiment-management/persisting-experiments [cleared]: /doc/user-guide/experiment-management/cleaning-experiments [checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints [running checkpoint experiments]: From 4aae87104564ab8dc5ddad518aef2f28449d1f16 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 22:38:43 -0600 Subject: [PATCH 13/21] ref: remove details on queued checkpoint exps from exp run Desc. It's mentioned in the Options (revised). --- content/docs/command-reference/exp/run.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 884e7189a9..1f2a7e8fbe 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -41,9 +41,6 @@ It's possible to schedule experiments for later execution with can execute them one by one (default) or in parallel (using the `--jobs` option). -> Note that queuing an experiment that uses checkpoints implies `--reset`, -> unless a `--rev` is provided (refer to the previous section). - > 📖 Learn more about the [experiments queue]. It's also possible to run special [checkpoint experiments] for deep learning ML. @@ -80,8 +77,10 @@ It's also possible to run special [checkpoint experiments] for deep learning ML. - `--queue` - place this experiment at the end of a line for future execution, but don't actually run it yet. Use `dvc exp run --run-all` to process the - queue. For checkpoint experiments, this implies `--reset` unless a `--rev` is - provided. + queue. + + > For checkpoint experiments, this implies `--reset` unless a `--rev` is + > provided. - `--run-all` - run all queued experiments (see `--queue`) and outside your workspace (in `.dvc/tmp/exps`). Use `-j` to execute them @@ -94,7 +93,7 @@ It's also possible to run special [checkpoint experiments] for deep learning ML. - `-r `, `--rev ` - continue an experiment from a specific checkpoint name or hash (`commit`) in `--queue` or `--temp` runs. -- `--reset` - deletes `checkpoint` outputs before running this experiment +- `--reset` - deletes `checkpoint: true` outputs before running this experiment (regardless of `dvc.lock`). Useful for ML model re-training. - `-f`, `--force` - reproduce pipelines even if no changes were found (same as From 122e08f8c11a5082b0347eb00a8316c670a8b2c7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 12 Jan 2022 23:38:40 -0600 Subject: [PATCH 14/21] ref: simplify example in exp run merge explanation into corresponding guide --- content/docs/command-reference/exp/run.md | 13 +++++-------- .../experiment-management/running-experiments.md | 7 ++++--- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 1f2a7e8fbe..486bc444c6 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -110,8 +110,8 @@ It's also possible to run special [checkpoint experiments] for deep learning ML. ## Examples -> These examples are based on our [Get Started](/doc/start/experiments), where -> you can find the actual source code. +> This is based on our [Get Started](/doc/start/experiments), where you can find +> the actual source code.
@@ -166,19 +166,16 @@ experiment we just ran (`exp-44136`). ## Example: Modify parameters on-the-fly -You could modify a params file just like any other dependency and -run an experiment on that basis. Since this is a common need, `dvc exp run` -comes with the `--set-param` (`-S`) option built-in to update existing -parameters. This saves you the need to manually edit the params file. +`dvc exp run--set-param` (`-S`) saves you the need to manually edit the params +file before running an experiment. ```dvc $ dvc exp run -S prepare.split=0.25 -S featurize.max_features=2000 ... Reproduced experiment(s): exp-18bf6 -Experiment results have been applied to your workspace. ``` -To see the results, we can use `dvc exp diff` which compares both params and +To see the results, we can use `dvc exp diff`, which compares both params and metrics to the previous project version: ```dvc diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 811555acba..c568bdf843 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -164,9 +164,10 @@ $ dvc exp run -S myparams.toml:learning_rate = 0.0001 ### Updating experiment parameters on-the-fly -DVC allows to update parameters from command line when running experiments. The -`--set-param` (`-S`) option takes an existing parameter name and its value, and -updates the params file before the run. +You could manually edit a params file and run an experiment on that basis. Since +this is a common sequence, the built-in option `dvc exp run --set-param` (`-S`) +is provided as a shortcut. It takes an existing param name and its value, and +updates the file before the run for you. ```dvc $ cat params.yaml From 379da9a2cc27b833680047c5b0511700ef598142 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 13 Jan 2022 13:12:58 -0600 Subject: [PATCH 15/21] Update content/docs/command-reference/exp/run.md Co-authored-by: Emre Sahin --- content/docs/command-reference/exp/run.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 486bc444c6..9366385a3f 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -175,7 +175,7 @@ $ dvc exp run -S prepare.split=0.25 -S featurize.max_features=2000 Reproduced experiment(s): exp-18bf6 ``` -To see the results, we can use `dvc exp diff`, which compares both params and +To see the results, you can use `dvc exp diff`. It compares both params and metrics to the previous project version: ```dvc From a4691a643a0b0abd30b8563bd826907fb0a344b8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 17 Jan 2022 20:41:44 -0600 Subject: [PATCH 16/21] ref: describe all major features of exp run per https://github.com/iterative/dvc.org/pull/3182#issuecomment-1012558741 --- content/docs/command-reference/exp/run.md | 38 +++++++++++------------ 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 9366385a3f..00a13d14f1 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -23,38 +23,36 @@ Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc. -> 📖 See full [Running Experiments] guide for more information. - -Before running an experiment, you'll probably want to make modifications such as -parameter tuning. You can use the `--set-param` (`-S`) option to -change param values on-the fly. - > `dvc exp run` has the same behavior as `dvc repro` when it comes to `targets` > and stage execution (restores the dependency graph, etc.). See the command > [options](#options) for more on the differences. -Successful experiments can be [made persistent] by committing them to the Git -repo. Unnecessary ones can be [cleared]. +Use the `--set-param` (`-S`) option as a shortcut to change +parameter values [on-the-fly] before running the experiment. -It's possible to schedule experiments for later execution with -`dvc exp run --queue`. To actually run them, use `dvc exp run --run-all`. This -can execute them one by one (default) or in parallel (using the `--jobs` -option). +It's possible to [queue experiments] for later execution with the `--queue` flag +(nothing is actually executed). To run them, use `dvc exp run --run-all`. Queued +experiments are run one by one by default, but can be run in parallel using the +`--jobs` option. -> 📖 Learn more about the [experiments queue]. +It's also possible to run special [checkpoint experiments] that log the +execution progress (useful for deep learning ML). The `--rev` and `--reset` +options are specific to these. -It's also possible to run special [checkpoint experiments] for deep learning ML. +> 📖 See the [Running Experiments] guide for more details on all these features. -> 📖 See [Running checkpoint experiments]. +Successful experiments can be [made persistent] by restoring them via +`dvc exp branch` or `dvc exp apply` and committing them to the Git repo. +Unnecessary ones can be [cleared] with `dvc exp gc`. +[on-the-fly]: + /doc/user-guide/experiment-management/running-experiments#updating-experiment-parameters-on-the-fly +[queue experiments]: + /doc/user-guide/experiment-management/running-experiments#the-experiments-queue +[checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints [running experiments]: /doc/user-guide/experiment-management/running-experiments [made persistent]: /doc/user-guide/experiment-management/persisting-experiments [cleared]: /doc/user-guide/experiment-management/cleaning-experiments -[checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints -[running checkpoint experiments]: - /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments -[experiments queue]: - /doc/user-guide/experiment-management/running-experiments#the-experiments-queue ## Options From 084a6edbd33fd3979987845467e31aa3a7f73afb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 17 Jan 2022 20:42:08 -0600 Subject: [PATCH 17/21] checkpoints: remove "in-code" term --- content/docs/api-reference/make_checkpoint.md | 5 ++++- .../project-structure/pipelines-files.md | 14 +++++++------- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/content/docs/api-reference/make_checkpoint.md b/content/docs/api-reference/make_checkpoint.md index 9a3cb2a33d..705b10f3b8 100644 --- a/content/docs/api-reference/make_checkpoint.md +++ b/content/docs/api-reference/make_checkpoint.md @@ -1,11 +1,14 @@ # dvc.api.make_checkpoint() -Make an [in-code checkpoint](/doc/user-guide/experiment-management/checkpoints). +Make an in-code [checkpoint]. ```py def make_checkpoint() ``` +[checkpoint]: + /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments + #### Usage: ```py diff --git a/content/docs/user-guide/project-structure/pipelines-files.md b/content/docs/user-guide/project-structure/pipelines-files.md index e9e0bf2875..2aa6aa943f 100644 --- a/content/docs/user-guide/project-structure/pipelines-files.md +++ b/content/docs/user-guide/project-structure/pipelines-files.md @@ -381,13 +381,13 @@ validation and auto-completion. > These include a subset of the fields in `.dvc` file > [output entries](/doc/user-guide/project-structure/dvc-files#output-entries). -| Field | Description | -| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | -| `remote` | (Optional) name of the remote to use for pushing/fetching. | -| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts | -| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [in-code checkpoints](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | -| `desc` | (Optional) user description for this output. This doesn't affect any DVC operations. | +| Field | Description | +| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | +| `remote` | (Optional) name of the remote to use for pushing/fetching. | +| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts | +| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | +| `desc` | (Optional) user description for this output. This doesn't affect any DVC operations. | ⚠ī¸ Note that using the `checkpoint` field in `dvc.yaml` is not compatible with `dvc repro`. From 5e5dfd3aaf9ba4f598e6c7c74bbfdaa73d2a0de1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 17 Jan 2022 21:29:16 -0600 Subject: [PATCH 18/21] guide: consolidate pipeline-related info in Running into a single section and make Results another section --- .../running-experiments.md | 101 +++++------------- 1 file changed, 25 insertions(+), 76 deletions(-) diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index c568bdf843..4ccf15fb8b 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -8,104 +8,53 @@ details. > experimentation, you may want to check the basics in > [Get Started: Experiments](/doc/start/experiments/) first. -## The pipeline +## Pipelines files -DVC relies on pipelines that codify experiment workflows (code, -stages, parameters, outputs, etc.) in a -`dvc.yaml` file. These contain the commands to run the experiments. +DVC relies on `dvc.yaml` files that contain the commands to run the +experiment(s). These files codify _pipelines_ that specify the +stages of experiment workflows (code, parameters, +outputs, etc.). > 📖 See [Get Started: Data Pipelines](/doc/start/data-pipelines) for an intro > to this topic. -[ug-pipeline-files]: /doc/user-guide/project-structure/pipelines-files +### Running the pipeline(s) -### Running the pipeline - -You can run the pipeline using default settings with `dvc exp run`: +You can run the pipeline using `dvc exp run`. It uses `./dvc.yaml` (in the +current directory) by default: ```dvc $ dvc exp run +... +Reproduced experiment(s): exp-44136 ``` -DVC keeps track of the dependency graph and runs only the stages with changed -dependencies or missing outputs. - -> Example: for a pipeline composed of `prepare`, `train`, and `evaluate` stages, -> if a dependency of `prepare` stage has changed, the downstream stages -> (`train`, `evaluate`) are also run. - -### Running specific stages - -By default DVC uses `./dvc.yaml` (in the current directory). You can specify -`dvc.yaml` files in other directories, or even specific stages to run. These are -given as the last argument to the `dvc exp run`. Examples: - -```dvc -$ dvc exp run my-project/dvc.yaml # a specific dvc.yaml file - -$ dvc exp run extract # a specific stage (from `./dvc.yaml`) - -$ dvc exp run my-project/dvc.yaml:extract - # ^ a stage from a specific dvc.yaml file -``` - -> 📖 See [reproduction `targets`](/doc/command-reference/repro#options) for all -> the details. - -### Running stages independently - -In some cases you may need to run a stage without invoking its dependents. The -`--single-item` (`-s`) flag allows to run the command of a single stage. - -> Example: for a pipeline composed of `prepare`, `train`, and `evaluate` stages -> and you only want to run the `train` stage to check its outputs, you can do so -> by: -> -> ```dvc -> $ dvc exp run --single-stage train -> ``` - -### Running all pipelines - -DVC projects support more than a single pipeline in one or more -`dvc.yaml` files. In this case, you can run all pipelines with a single command: - -```dvc -$ dvc exp run --all-pipelines -``` - -> Note that the order in which pipelines are executed is not guaranteed; Only -> the internal order of stage execution is. - -> (ℹī¸) When your `dvc.yaml` files are organized inside recursive subfolders, you -> can selectively run the pipeline(s) using `--recursive` (takes a parent -> directory as argument). +DVC keeps track of the [dependency graph] among stages. It only runs the ones +with changed dependencies or outputs missing from the cache. You +can limit this to certain [reproduction targets] or even single stages +(`--single-item` flag). -### Running stages interactively +DVC projects actually supports more than one pipeline, in one or +more `dvc.yaml` files. The `--all-pipelines` option lets you run them all at +once. -When you want to have more granular control over which stages are run, you can -use the `--interactive` option. This flag allows you to confirm each stage -before running. - -```dvc -$ dvc exp run --interactive -Going to reproduce stage: 'train'... continue? [y/n] -``` +> 📖 `dvc exp run` is an experiment-specific alternative to `dvc repro` where +> you can learn more about these and other pipeline-related options. -> Note that `dvc exp run` is an experimentation-specific alternative to -> `dvc repro`. +[reproduction targets]: /doc/command-reference/repro#options +[dependency graph]: /doc/command-reference/dag#directed-acyclic-graph -### Working with the results +## Experiment results -The results of the last `dvc exp run` can be seen in the workspace -and are stored and tracked internally by DVC. +The results of the last `dvc exp run` can be seen in the workspace. +They are stored and tracked internally by DVC. To display and compare multiple experiments, use `dvc exp show` or `dvc exp diff`. `plots diff` also accepts experiments as `revisions`. See [Reviewing and Comparing Experiments][reviewing] for more details. Use `dvc exp apply` to restore the results of any other experiment instead. See -[Bring experiment results to your workspace][apply] for more. +[Bring experiment results to your workspace][apply] for more info. [reviewing]: /doc/user-guide/experiment-management/comparing-experiments [apply]: From 3239633a58944fc5723199cd75533e4d3195972c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 17 Jan 2022 22:35:13 -0600 Subject: [PATCH 19/21] guide: compress Params section in Running Exps --- content/docs/command-reference/exp/run.md | 3 +- .../running-experiments.md | 78 ++++--------------- 2 files changed, 18 insertions(+), 63 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 00a13d14f1..af353c4993 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -45,8 +45,7 @@ Successful experiments can be [made persistent] by restoring them via `dvc exp branch` or `dvc exp apply` and committing them to the Git repo. Unnecessary ones can be [cleared] with `dvc exp gc`. -[on-the-fly]: - /doc/user-guide/experiment-management/running-experiments#updating-experiment-parameters-on-the-fly +[on-the-fly]: #example-modify-parameters-on-the-fly [queue experiments]: /doc/user-guide/experiment-management/running-experiments#the-experiments-queue [checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 4ccf15fb8b..4df7c91682 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -12,7 +12,7 @@ details. DVC relies on `dvc.yaml` files that contain the commands to run the experiment(s). These files codify _pipelines_ that specify the -stages of experiment workflows (code, parameters, +stages of experiment workflows (code, dependencies, outputs, etc.). > 📖 See [Get Started: Data Pipelines](/doc/start/data-pipelines) for an intro @@ -49,7 +49,8 @@ once. The results of the last `dvc exp run` can be seen in the workspace. They are stored and tracked internally by DVC. -To display and compare multiple experiments, use `dvc exp show` or +To display and compare multiple experiments along with their +parameters and metrics, use `dvc exp show` or `dvc exp diff`. `plots diff` also accepts experiments as `revisions`. See [Reviewing and Comparing Experiments][reviewing] for more details. @@ -60,81 +61,36 @@ Use `dvc exp apply` to restore the results of any other experiment instead. See [apply]: /doc/user-guide/experiment-management/persisting-experiments#bring-experiment-results-to-your-workspace -## (Hyper)parameters +## Tuning (hyper)parameters -ML parameters are the values that modify the underlying code's -behavior, producing different experiment results. Machine learning -experimentation, for example, involves searching hyperparameters that improve -the resulting model metrics. +Parameters are the values that modify the behavior of coded processes -- in this +case producing different experiment results. Machine learning experimentation +often involves defining and searching hyperparameter spaces to improve the +resulting model metrics. -In DVC projects, parameters should be read by the code from _parameter files_ -(`params.yaml` by default). DVC parses these files to track individual param -values. When a tracked param is changed, `dvc exp run` invalidates any stages -that depend on it, and reruns the experiment. +In DVC project source code, parameters should be read from _params +files_ (`params.yaml` by default) and defined in `dvc.yaml`. When a tracked +param value has changed, `dvc exp run` invalidates any stages that depend on it, +and reproduces them. -> Parameters can be defined in `dvc.yaml` directly or through `dvc stage add`. > 📖 See `dvc params` for more details. -For a params file named `params.yaml` with the contents - -```yaml -model: - learning_rate: 0.0001 -``` - -You can specify the parameter dependency as - -```dvc -$ dvc stage add -n train \ - --parameter model.learning_rate \ - --outs ... -``` - -> ⚠ī¸ DVC does not check whether the parameters are actually used in your code. - -
- -#### Non-default parameter files - -DVC allows param files in YAML 1.2, JSON, TOML, and Python formats. When your -parameters file is named something other than `params.yaml`, you need to specify -it in both stage description and `dvc exp run`. For example using -`myparams.toml`: - -```dvc -$ dvc stage add -n train \ - -p myparams.toml:learning_rate \ - ... - -$ dvc exp run -S myparams.toml:learning_rate = 0.0001 -``` - -
- -### Updating experiment parameters on-the-fly - You could manually edit a params file and run an experiment on that basis. Since this is a common sequence, the built-in option `dvc exp run --set-param` (`-S`) -is provided as a shortcut. It takes an existing param name and its value, and -updates the file before the run for you. +is provided as a shortcut. It takes an existing param name and value, and +updates the file on-the-fly before execution. ```dvc $ cat params.yaml model: learning_rate: 0.001 + units=64 $ dvc exp run --set-param model.learning_rate=0.0002 ... -``` - -> Note that parameters are attached to experiments so they're shown together -> when [reviewing] them (e.g. in `dvc exp show`). -To set more than one param for the same experiment, use the `-S` option multiple -times: - -```dvc -$ dvc exp run -S learning_rate=0.001 -S units=128 +$ dvc exp run -S learning_rate=0.001 -S units=128 # set multiple params +... ``` ## The experiments queue From df74ce9416435112b1ec7eadb67733a96723ea28 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 17 Jan 2022 23:00:54 -0600 Subject: [PATCH 20/21] guide: summarize the Exps Queue section of Running Exps and move parallelism details back to the ref..? --- content/docs/command-reference/exp/run.md | 28 +++++-- .../running-experiments.md | 81 +++++++------------ 2 files changed, 51 insertions(+), 58 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index af353c4993..c2b5506b44 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -30,26 +30,32 @@ directories, etc. Use the `--set-param` (`-S`) option as a shortcut to change parameter values [on-the-fly] before running the experiment. -It's possible to [queue experiments] for later execution with the `--queue` flag -(nothing is actually executed). To run them, use `dvc exp run --run-all`. Queued -experiments are run one by one by default, but can be run in parallel using the -`--jobs` option. +It's possible to [queue experiments] for later execution with the `--queue` +flag. To actually run them, use `dvc exp run --run-all`. Queued experiments are +run sequentially by default, but can be run in parallel using the `--jobs` +option. + +> ⚠ī¸ Parallel runs are experimental and may be unstable. Make sure you're using +> a number of jobs that your environment can handle (no more than the CPU +> cores). It's also possible to run special [checkpoint experiments] that log the execution progress (useful for deep learning ML). The `--rev` and `--reset` -options are specific to these. +options have special uses for these. > 📖 See the [Running Experiments] guide for more details on all these features. -Successful experiments can be [made persistent] by restoring them via -`dvc exp branch` or `dvc exp apply` and committing them to the Git repo. -Unnecessary ones can be [cleared] with `dvc exp gc`. +[Review] run experiments with `dvc exp show`. Successful ones can be [made +persistent] by restoring them via `dvc exp branch` or `dvc exp apply` and +committing them to the Git repo. Unnecessary ones can be [cleared] with +`dvc exp gc`. [on-the-fly]: #example-modify-parameters-on-the-fly [queue experiments]: /doc/user-guide/experiment-management/running-experiments#the-experiments-queue [checkpoint experiments]: /doc/user-guide/experiment-management/checkpoints [running experiments]: /doc/user-guide/experiment-management/running-experiments +[review]: /doc/user-guide/experiment-management/comparing-experiments [made persistent]: /doc/user-guide/experiment-management/persisting-experiments [cleared]: /doc/user-guide/experiment-management/cleaning-experiments @@ -87,6 +93,10 @@ Unnecessary ones can be [cleared] with `dvc exp gc`. parallel. Only has an effect along with `--run-all`. Defaults to 1 (the queue is processed serially). + > Note that since queued experiments are run isolated from each other, common + > stages may sometimes be executed several times depending on the state of the + > [run-cache] at that time. + - `-r `, `--rev ` - continue an experiment from a specific checkpoint name or hash (`commit`) in `--queue` or `--temp` runs. @@ -105,6 +115,8 @@ Unnecessary ones can be [cleared] with `dvc exp gc`. - `-v`, `--verbose` - displays detailed tracing information. +[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache + ## Examples > This is based on our [Get Started](/doc/start/experiments), where you can find diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 4df7c91682..276a0f97f1 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -44,23 +44,6 @@ once. [reproduction targets]: /doc/command-reference/repro#options [dependency graph]: /doc/command-reference/dag#directed-acyclic-graph -## Experiment results - -The results of the last `dvc exp run` can be seen in the workspace. -They are stored and tracked internally by DVC. - -To display and compare multiple experiments along with their -parameters and metrics, use `dvc exp show` or -`dvc exp diff`. `plots diff` also accepts experiments as `revisions`. See -[Reviewing and Comparing Experiments][reviewing] for more details. - -Use `dvc exp apply` to restore the results of any other experiment instead. See -[Bring experiment results to your workspace][apply] for more info. - -[reviewing]: /doc/user-guide/experiment-management/comparing-experiments -[apply]: - /doc/user-guide/experiment-management/persisting-experiments#bring-experiment-results-to-your-workspace - ## Tuning (hyper)parameters Parameters are the values that modify the behavior of coded processes -- in this @@ -93,6 +76,23 @@ $ dvc exp run -S learning_rate=0.001 -S units=128 # set multiple params ... ``` +## Experiment results + +The results of the last `dvc exp run` can be seen in the workspace. +They are stored and tracked internally by DVC. + +To display and compare multiple experiments along with their +parameters and metrics, use `dvc exp show` or +`dvc exp diff`. `plots diff` also accepts experiments as `revisions`. See +[Reviewing and Comparing Experiments][reviewing] for more details. + +Use `dvc exp apply` to restore the results of any other experiment instead. See +[Bring experiment results to your workspace][apply] for more info. + +[reviewing]: /doc/user-guide/experiment-management/comparing-experiments +[apply]: + /doc/user-guide/experiment-management/persisting-experiments#bring-experiment-results-to-your-workspace + ## The experiments queue The `--queue` option of `dvc exp run` tells DVC to append an experiment for @@ -122,22 +122,22 @@ is found in `.git/refs/exps`, and earlier ones are in its [reflog].
-Each experiment is derived from the workspace at the time it's -queued. If you make changes in the workspace afterwards, they won't be reflected -in queued experiments (once run). - -Run them all one-by-one with the `--run-all` flag. For isolation, this is done -outside your workspace (in temporary directories). - -> Note that the order of execution is independent of their creation order. +Run them all with the `--run-all` flag: ```dvc $ dvc exp run --run-all +... ``` +> Note that the order of execution is independent of their creation order. + +Their execution happens outside your workspace in temporary +directories for isolation, so each experiment is derived from the workspace at +the time it was queued. +
-### How are queued experiments isolated? +### How are experiments isolated? DVC creates a copy of the experiment's original workspace in `.dvc/tmp/exps/` and runs it there. All workspaces share the single project cache, @@ -154,36 +154,17 @@ nohup: ignoring input and appending output to 'nohup.out' Note that Git-ignored files/dirs are excluded from queued/temp runs to avoid committing unwanted files into Git (e.g. once successful experiments are -[persisted]). - -> 💡 To include untracked files, stage them with `git add` first (before -> `dvc exp run`) and `git reset` them afterwards. +[persisted]). To include untracked files, stage them with `git add` first +(before `dvc exp run`) and `git reset` them afterwards. [persisted]: /doc/user-guide/experiment-management/persisting-experiments
-To remove all experiments from the queue and start over, you can use -`dvc exp remove --queue`. - -### Running experiments in parallel - -DVC allows to run queued experiments in parallel by specifying a number of -execution processes (`--jobs`): - -```dvc -$ dvc exp run --run-all --jobs 4 -``` - -> Note that since each experiment runs in an independent temporary directory, -> common stages may sometimes be executed several times depending -> on the state of the [run-cache] at that time. - -[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache +💡 To clear the experiments queue and start over, use `dvc exp remove --queue`. -⚠ī¸ Parallel runs are experimental and may be unstable at this time. ⚠ī¸ Make sure -you're using a number of jobs that your environment can handle (no more than the -CPU cores). +> 📖 See the `dvc exp run` reference for more options related to experiments +> queue, such as running them in parallel with `--jobs`. ## Checkpoint experiments From 64eaf8f8a035a543055c862a68a15d899e6d3b01 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 17 Jan 2022 23:44:36 -0600 Subject: [PATCH 21/21] guide: undo changes to Checkpoints guide extracted to https://github.com/iterative/dvc.org/pull/3189 --- content/docs/api-reference/make_checkpoint.md | 5 +- .../experiment-management/checkpoints.md | 78 ++++++++----------- .../project-structure/pipelines-files.md | 14 ++-- 3 files changed, 41 insertions(+), 56 deletions(-) diff --git a/content/docs/api-reference/make_checkpoint.md b/content/docs/api-reference/make_checkpoint.md index 705b10f3b8..9a3cb2a33d 100644 --- a/content/docs/api-reference/make_checkpoint.md +++ b/content/docs/api-reference/make_checkpoint.md @@ -1,14 +1,11 @@ # dvc.api.make_checkpoint() -Make an in-code [checkpoint]. +Make an [in-code checkpoint](/doc/user-guide/experiment-management/checkpoints). ```py def make_checkpoint() ``` -[checkpoint]: - /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments - #### Usage: ```py diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index dd676bbe39..b3501b333b 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -2,9 +2,15 @@ _New in DVC 2.0_ -To track successive steps in a longer machine learning experiment, you can -register checkpoints from your code at runtime, for example to track the -progress with deep learning techniques. They can help you +To track successive steps in a longer experiment, you can register checkpoints +from your code at runtime. This is especially helpful in machine learning, for +example to track the progress in deep learning techniques such as evolving +neural networks. + +_Checkpoint experiments_ track a series of variations (the checkpoints) and +their execution can be stopped and resumed as needed. You interact with them +using the `--rev` and `--reset` options of `dvc exp run` (see also the +`checkpoint` field in `dvc.yaml` `outs`). They can help you - implement the best practice in deep learning to save your model weights as checkpoints. @@ -12,25 +18,8 @@ progress with deep learning techniques. They can help you - see when metrics start diverging and revert to the optimal checkpoint. - automate the process of tracking every training epoch. -Checkpoint [execution] can be stopped and resumed as needed. You interact with -them using the `--rev` and `--reset` options of `dvc exp run` (see also the -`checkpoint` field in `dvc.yaml` `outs`). - -[execution]: - /doc/user-guide/experiment-management/running-experiments#checkpoint-experiments - -
- -### ⚙ī¸ How are checkpoints captured? - -Instead of a single reference like [regular experiments], checkpoint experiments -have multiple commits under the custom Git reference (in `.git/refs/exps`), -similar to a branch. - -[regular experiments]: - /doc/user-guide/experiment-management/experiments-overview - -
+> Experiments and checkpoints are [implemented](/blog/experiment-refs) with +> hidden Git experiment commits branches. Like with regular experiments, checkpoints can become persistent by [committing them to Git](#committing-checkpoints-to-git). @@ -73,36 +62,38 @@ running: $ pip install -r requirements.txt ``` -This will download all of the packages you need to run the example. - -To initialize this project as a DVC repository, use `dvc init`. Now -you have everything you need to get started with experiments and checkpoints. +This will download all of the packages you need to run the example. Now you have +everything you need to get started with experiments and checkpoints. ## Setting up a DVC pipeline -DVC can version data as well as the ML model weights file in checkpoints during -the training process. To enable this, you will need to set up a -[DVC pipeline](/doc/start/data-pipelines) to train your model. +DVC versions data and it also can version the ML model weights file as +checkpoints during the training process. To enable this, you will need to set up +a DVC pipeline to train your model. + +Adding a DVC pipeline only takes a few commands. At the root of the project, +run: + +```dvc +$ dvc init +``` -Now we need to add a training stage to `dvc.yaml` including `checkpoint: true` -in its output. This tells DVC which cached output(s) -to use to resume the experiment later (a circular dependency). We'll do this -with `dvc stage add`. +This sets up the files you need for your DVC pipeline to work. + +Now we need to add a stage for training our model within a DVC pipeline. We'll +do that with `dvc stage add`, which we'll explain more later. For now, run the +following command: ```dvc -$ dvc stage add --name train \ - --deps data/MNIST --deps train.py \ - --params seed,lr,weight_decay \ - --checkpoints model.pt \ - --plots-no-cache predictions.json \ - --live dvclive \ - python train.py +$ dvc stage add --name train --deps data/MNIST --deps train.py \ + --checkpoints model.pt --plots-no-cache predictions.json \ + --params seed,lr,weight_decay --live dvclive python train.py ``` -💡 The `--live dvclive` option enables our special logger -[DVCLive](/doc/dvclive), which helps you register checkpoints from code. +The `--live dvclive` option enables our special logger [DVCLive](/doc/dvclive), +which helps you register checkpoints from your code. The checkpoints need to be enabled in DVC at the pipeline level. The `-c / --checkpoint` option of the `dvc stage add` command defines the checkpoint @@ -141,9 +132,6 @@ stages: html: true ``` -⚠ī¸ Note that enabling checkpoints in a `dvc.yaml` file makes it incompatible -with `dvc repro`. - Before we go any further, this is a great point to add these changes to your Git history. You can do that with the following commands: diff --git a/content/docs/user-guide/project-structure/pipelines-files.md b/content/docs/user-guide/project-structure/pipelines-files.md index 2aa6aa943f..e9e0bf2875 100644 --- a/content/docs/user-guide/project-structure/pipelines-files.md +++ b/content/docs/user-guide/project-structure/pipelines-files.md @@ -381,13 +381,13 @@ validation and auto-completion. > These include a subset of the fields in `.dvc` file > [output entries](/doc/user-guide/project-structure/dvc-files#output-entries). -| Field | Description | -| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | -| `remote` | (Optional) name of the remote to use for pushing/fetching. | -| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts | -| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | -| `desc` | (Optional) user description for this output. This doesn't affect any DVC operations. | +| Field | Description | +| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | +| `remote` | (Optional) name of the remote to use for pushing/fetching. | +| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts | +| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [in-code checkpoints](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | +| `desc` | (Optional) user description for this output. This doesn't affect any DVC operations. | ⚠ī¸ Note that using the `checkpoint` field in `dvc.yaml` is not compatible with `dvc repro`.