From 5008c554efd7c3dc0abd09ec04e39446225104e9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 21 Jul 2021 08:56:10 +0000 Subject: [PATCH 01/56] guide: split Experiments (index) into sub-pages --- content/docs/sidebar.json | 8 +- .../experiment-management/checkpoints.md | 78 ++++++++---------- .../experiment-management/experiments.md | 18 +++++ .../user-guide/experiment-management/index.md | 79 ------------------- .../experiment-management/organization.md | 20 +++++ .../persistent-experiments.md | 11 +++ .../experiment-management/run-cache.md | 12 +++ 7 files changed, 101 insertions(+), 125 deletions(-) create mode 100644 content/docs/user-guide/experiment-management/experiments.md create mode 100644 content/docs/user-guide/experiment-management/organization.md create mode 100644 content/docs/user-guide/experiment-management/persistent-experiments.md create mode 100644 content/docs/user-guide/experiment-management/run-cache.md diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 2c059fd042..8b443f86f3 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,7 +139,13 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": ["checkpoints"] + "children": [ + "experiments", + "checkpoints", + "persistent-experiments", + "organization", + "run-cache" + ] }, "setup-google-drive-remote", "large-dataset-optimization", diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index 521ef97522..393b080f46 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -1,27 +1,30 @@ # Checkpoints -ML checkpoints are an important part of deep learning because ML engineers like -to save the model files at certain points during a training process. +To track successive steps in a longer experiment, you can register checkpoints +from your code at runtime. This is especially helpful in machine learning, for +example to track the progress in deep learning techniques such as evolving +neural networks. -With DVC experiments and checkpoints, you can: +_Checkpoint experiments_ track a series of variations (the checkpoints) and +their execution can be stopped and resumed as needed. You interact with them +using the `--rev` and `--reset` options of `dvc exp run` (see also the +`checkpoint` field in `dvc.yaml` `outs`). They can help you -- Implement the best practice in deep learning to save your model weights as +- implement the best practice in deep learning to save your model weights as checkpoints. -- Track all code and data changes corresponding to the checkpoints. -- See when metrics start diverging and revert to the optimal checkpoint. -- Automate the process of tracking every training epoch. +- track all code and data changes corresponding to the checkpoints. +- see when metrics start diverging and revert to the optimal checkpoint. +- automate the process of tracking every training epoch. -[The way checkpoints are implemented by DVC](/blog/experiment-refs) utilizes -_ephemeral_ experiment commits and experiment branches within DVC. They are -created using the metadata from experiments and are tracked with the `exps` -custom Git reference. +> Experiments and checkpoints are [implemented](/blog/experiment-refs) with +> hidden Git experiment commits branches. -You can add experiments to your Git history by committing the experiment you -want to track, which you'll see later in this tutorial. +Like with regular experiments, checkpoints can become persistent by +[committing them to Git](#committing-checkpoints-to-git). -This tutorial is going to cover how to implement checkpoints in an ML project -using DVC. We're going to train a model to identify handwritten digits based on -the MNIST dataset. +This guide covers how to implement checkpoints in an ML project using DVC. We're +going to train a model to identify handwritten digits based on the MNIST +dataset.
@@ -62,9 +65,9 @@ everything you need to get started with experiments and checkpoints. ## Setting up a DVC pipeline -DVC versions data and it also can version the machine learning model weights -file as checkpoints during the training process. To enable this, you will need -to set up a DVC pipeline to train your model. +DVC versions data and it also can version the ML model weights file as +checkpoints during the training process. To enable this, you will need to set up +a DVC pipeline to train your model. Adding a DVC pipeline only takes a few commands. At the root of the project, run: @@ -190,9 +193,8 @@ You can read about what the line `dvclive.log(k, v)` does in the The [`dvclive.next_step()`](/doc/dvclive/api-reference/next_step) line tells DVC that it can take a snapshot of the entire workspace and version it with Git. -It's important that with this approach only code with metadata is versioned in -Git (as an ephemeral commit), while the actual model weight file will be stored -in the DVC data cache. +It's important that with this approach only code with metadata is versioned, +while the actual model weight file will be stored in the DVC data cache. ## Running experiments @@ -407,39 +409,25 @@ new set of checkpoints under a new experiment branch. └─────────────────────────┴──────────┴──────┴─────────┴────────┴────────┴────────┴──────────────┘ ``` -## Adding checkpoints to Git +## Committing checkpoints to Git When you terminate training, you'll see a few commands in the terminal that will -allow you to add these changes to Git. +allow you to add these changes to Git, making them persistent: -``` +```dvc To track the changes with git, run: git add dvclive.json dvc.yaml .gitignore train.py dvc.lock -Reproduced experiment(s): exp-263da -Experiment results have been applied to your workspace. - -To promote an experiment to a Git branch run: - - dvc exp branch -``` - -You can run the following command to save your experiments to the Git history. - -```bash -$ git add dvclive.json dvc.yaml .gitignore train.py dvc.lock +... ``` -You can take a look at what will be committed to your Git history by running: +Running the command above will stage the checkpoint experiment with Git. You can +take a look at what would be committed first with `git status`. You should see +something similar to this in your terminal: -```bash +```dvc $ git status -``` - -You should see something similar to this in your terminal. - -``` Changes to be committed: (use "git restore --staged ..." to unstage) new file: .gitignore @@ -456,7 +444,7 @@ Untracked files: predictions.json ``` -All that's left is to commit these changes with the following command: +All that's left to do is to `git commit` the changes: ```bash $ git commit -m 'saved files from experiment' diff --git a/content/docs/user-guide/experiment-management/experiments.md b/content/docs/user-guide/experiment-management/experiments.md new file mode 100644 index 0000000000..589cca3301 --- /dev/null +++ b/content/docs/user-guide/experiment-management/experiments.md @@ -0,0 +1,18 @@ +## Experiments + +`dvc exp` commands let you automatically track a variation to an established +[data pipeline](/doc/command-reference/dag). You can create multiple isolated +experiments this way, as well as review, compare, and restore them later, or +roll back to the baseline. The basic workflow goes like this: + +- Modify stage parameters or other dependencies (e.g. input data, + source code) of committed stages. +- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results + are reflected in your workspace, and tracked automatically. +- Use [metrics](/doc/command-reference/metrics) to identify the best + experiment(s). +- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat + 🔄 +- Use `dvc exp apply` to roll back to the best one. +- Make the selected experiment persistent by committing its results to Git. This + cleans the slate so you can repeat the process. diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 9912abd6ad..f4903e0675 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -32,82 +32,3 @@ meaningful measures for the experimental results. > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. - -## Experiments - -`dvc exp` commands let you automatically track a variation to an established -[data pipeline](/doc/command-reference/dag). You can create multiple isolated -experiments this way, as well as review, compare, and restore them later, or -roll back to the baseline. The basic workflow goes like this: - -- Modify stage parameters or other dependencies (e.g. input data, - source code) of committed stages. -- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results - are reflected in your workspace, and tracked automatically. -- Use [metrics](/doc/command-reference/metrics) to identify the best - experiment(s). -- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat - 🔄 -- Use `dvc exp apply` to roll back to the best one. -- Make the selected experiment persistent by committing its results to Git. This - cleans the slate so you can repeat the process. - -## Checkpoints in source code - -To track successive steps in a longer experiment, you can register checkpoints -from your code at runtime. This allows you, for example, to track the progress -in deep learning techniques such as evolving neural networks. - -This kind of experiments track a series of variations (the checkpoints) and its -execution can be stopped and resumed as needed. You interact with them using -`dvc exp run` and its `--rev`, `--reset` options (see also the `checkpoint` -field in `dvc.yaml` `outs`). - -> 📖 To learn more, see the dedicated -> [Checkpoints](/doc/user-guide/experiment-management/checkpoints) guide. - -## Persistent experiments - -When your experiments are good enough to save or share, you may want to store -them persistently as Git commits in your repository. - -Whether the results were produced with `dvc repro` directly, or after a -`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` -pair in the workspace will codify the experiment as a new project -version. The right outputs (including -[metrics](/doc/command-reference/metrics)) should also be present, or available -via `dvc checkout`. - -### Organization patterns - -DVC takes care of arranging `dvc exp` experiments and the data -cache under the hood. But when it comes to full-blown persistent -experiments, it's up to you to decide how to organize them in your project. -These are the main alternatives: - -- **Git tags and branches** - use the repo's "time dimension" to distribute your - experiments. This makes the most sense for experiments that build on each - other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can - be easily visualized, for example with tools - [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). -- **Directories** - the project's "space dimension" can be structured with - directories (folders) to organize experiments. Useful when you want to see all - your experiments at the same time (without switching versions) by just - exploring the file system. -- **Hybrid** - combining an intuitive directory structure with a good repo - branching strategy tends to be the best option for complex projects. - Completely independent experiments live in separate directories, while their - progress can be found in different branches. - -## Automatic log of stage runs (run-cache) - -Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the -unique signature of each stage run (to `.dvc/cache/runs` by default). If it -never happened before, the stage command(s) are executed normally. Every -subsequent time a [stage](/doc/command-reference/run) runs under the same -conditions, the previous results can be restored instantly, without wasting time -or computing resources. - -✅ This built-in feature is called run-cache and it can -dramatically improve performance. It's enabled out-of-the-box (but can be -disabled with the `--no-run-cache` command option). diff --git a/content/docs/user-guide/experiment-management/organization.md b/content/docs/user-guide/experiment-management/organization.md new file mode 100644 index 0000000000..7481dd718c --- /dev/null +++ b/content/docs/user-guide/experiment-management/organization.md @@ -0,0 +1,20 @@ +### Organization Patterns + +DVC takes care of arranging `dvc exp` experiments and the data +cache under the hood. But when it comes to full-blown persistent +experiments, it's up to you to decide how to organize them in your project. +These are the main alternatives: + +- **Git tags and branches** - use the repo's "time dimension" to distribute your + experiments. This makes the most sense for experiments that build on each + other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can + be easily visualized, for example with tools + [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). +- **Directories** - the project's "space dimension" can be structured with + directories (folders) to organize experiments. Useful when you want to see all + your experiments at the same time (without switching versions) by just + exploring the file system. +- **Hybrid** - combining an intuitive directory structure with a good repo + branching strategy tends to be the best option for complex projects. + Completely independent experiments live in separate directories, while their + progress can be found in different branches. diff --git a/content/docs/user-guide/experiment-management/persistent-experiments.md b/content/docs/user-guide/experiment-management/persistent-experiments.md new file mode 100644 index 0000000000..410d39c90f --- /dev/null +++ b/content/docs/user-guide/experiment-management/persistent-experiments.md @@ -0,0 +1,11 @@ +## Persistent Experiments + +When your experiments are good enough to save or share, you may want to store +them persistently as Git commits in your repository. + +Whether the results were produced with `dvc repro` directly, or after a +`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` +pair in the workspace will codify the experiment as a new project +version. The right outputs (including +[metrics](/doc/command-reference/metrics)) should also be present, or available +via `dvc checkout`. diff --git a/content/docs/user-guide/experiment-management/run-cache.md b/content/docs/user-guide/experiment-management/run-cache.md new file mode 100644 index 0000000000..596dc33750 --- /dev/null +++ b/content/docs/user-guide/experiment-management/run-cache.md @@ -0,0 +1,12 @@ +## Run Cache: Automatic Log of Stage Runs + +Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the +unique signature of each stage run (to `.dvc/cache/runs` by default). If it +never happened before, the stage command(s) are executed normally. Every +subsequent time a [stage](/doc/command-reference/run) runs under the same +conditions, the previous results can be restored instantly, without wasting time +or computing resources. + +✅ This built-in feature is called run-cache and it can +dramatically improve performance. It's enabled out-of-the-box (but can be +disabled with the `--no-run-cache` command option). From 923040f0567038fa943d08100651ffff40aa5684 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 29 Jul 2021 03:52:32 +0000 Subject: [PATCH 02/56] case: keep Persistent Exps in basic page --- content/docs/sidebar.json | 8 +------- .../experiment-management/checkpoints.md | 2 ++ .../experiment-management/experiments.md | 14 ++++++++++++++ .../persistent-experiments.md | 11 ----------- 4 files changed, 17 insertions(+), 18 deletions(-) delete mode 100644 content/docs/user-guide/experiment-management/persistent-experiments.md diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 8b443f86f3..66571771e3 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,13 +139,7 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": [ - "experiments", - "checkpoints", - "persistent-experiments", - "organization", - "run-cache" - ] + "children": ["experiments", "checkpoints", "organization", "run-cache"] }, "setup-google-drive-remote", "large-dataset-optimization", diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index 393b080f46..1b18ec3f55 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -1,5 +1,7 @@ # Checkpoints +_New in DVC 2.0_ + To track successive steps in a longer experiment, you can register checkpoints from your code at runtime. This is especially helpful in machine learning, for example to track the progress in deep learning techniques such as evolving diff --git a/content/docs/user-guide/experiment-management/experiments.md b/content/docs/user-guide/experiment-management/experiments.md index 589cca3301..c848e8029b 100644 --- a/content/docs/user-guide/experiment-management/experiments.md +++ b/content/docs/user-guide/experiment-management/experiments.md @@ -1,5 +1,7 @@ ## Experiments +_New in DVC 2.0_ + `dvc exp` commands let you automatically track a variation to an established [data pipeline](/doc/command-reference/dag). You can create multiple isolated experiments this way, as well as review, compare, and restore them later, or @@ -16,3 +18,15 @@ roll back to the baseline. The basic workflow goes like this: - Use `dvc exp apply` to roll back to the best one. - Make the selected experiment persistent by committing its results to Git. This cleans the slate so you can repeat the process. + +## Persistent Experiments + +When your experiments are good enough to save or share, you may want to store +them persistently as Git commits in your repository. + +Whether the results were produced with `dvc repro` directly, or after a +`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` +pair in the workspace will codify the experiment as a new project +version. The right outputs (including +[metrics](/doc/command-reference/metrics)) should also be present, or available +via `dvc checkout`. diff --git a/content/docs/user-guide/experiment-management/persistent-experiments.md b/content/docs/user-guide/experiment-management/persistent-experiments.md deleted file mode 100644 index 410d39c90f..0000000000 --- a/content/docs/user-guide/experiment-management/persistent-experiments.md +++ /dev/null @@ -1,11 +0,0 @@ -## Persistent Experiments - -When your experiments are good enough to save or share, you may want to store -them persistently as Git commits in your repository. - -Whether the results were produced with `dvc repro` directly, or after a -`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` -pair in the workspace will codify the experiment as a new project -version. The right outputs (including -[metrics](/doc/command-reference/metrics)) should also be present, or available -via `dvc checkout`. From 3ae85e5e9a586ca2604c92da76f4a6957dbfa365 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 29 Jul 2021 03:54:56 +0000 Subject: [PATCH 03/56] cases: keep Run-cache in basic Exps page --- content/docs/sidebar.json | 2 +- .../user-guide/experiment-management/experiments.md | 13 +++++++++++++ .../user-guide/experiment-management/run-cache.md | 12 ------------ 3 files changed, 14 insertions(+), 13 deletions(-) delete mode 100644 content/docs/user-guide/experiment-management/run-cache.md diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 66571771e3..241852fac2 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,7 +139,7 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": ["experiments", "checkpoints", "organization", "run-cache"] + "children": ["experiments", "checkpoints", "organization"] }, "setup-google-drive-remote", "large-dataset-optimization", diff --git a/content/docs/user-guide/experiment-management/experiments.md b/content/docs/user-guide/experiment-management/experiments.md index c848e8029b..5bb267f4cd 100644 --- a/content/docs/user-guide/experiment-management/experiments.md +++ b/content/docs/user-guide/experiment-management/experiments.md @@ -30,3 +30,16 @@ pair in the workspace will codify the experiment as a new project version. The right outputs (including [metrics](/doc/command-reference/metrics)) should also be present, or available via `dvc checkout`. + +## Run Cache: Automatic Log of Stage Runs + +Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the +unique signature of each stage run (to `.dvc/cache/runs` by default). If it +never happened before, the stage command(s) are executed normally. Every +subsequent time a [stage](/doc/command-reference/run) runs under the same +conditions, the previous results can be restored instantly, without wasting time +or computing resources. + +✅ This built-in feature is called run-cache and it can +dramatically improve performance. It's enabled out-of-the-box (but can be +disabled with the `--no-run-cache` command option). diff --git a/content/docs/user-guide/experiment-management/run-cache.md b/content/docs/user-guide/experiment-management/run-cache.md deleted file mode 100644 index 596dc33750..0000000000 --- a/content/docs/user-guide/experiment-management/run-cache.md +++ /dev/null @@ -1,12 +0,0 @@ -## Run Cache: Automatic Log of Stage Runs - -Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the -unique signature of each stage run (to `.dvc/cache/runs` by default). If it -never happened before, the stage command(s) are executed normally. Every -subsequent time a [stage](/doc/command-reference/run) runs under the same -conditions, the previous results can be restored instantly, without wasting time -or computing resources. - -✅ This built-in feature is called run-cache and it can -dramatically improve performance. It's enabled out-of-the-box (but can be -disabled with the `--no-run-cache` command option). From 29b17b228d2757f518a0dbcfcc29c34f9f7be01c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 29 Jul 2021 06:02:12 +0000 Subject: [PATCH 04/56] guide: edit Exp Mgmt index (intro) --- .../user-guide/basic-concepts/experiment.md | 5 +- .../user-guide/experiment-management/index.md | 60 +++++++++++-------- 2 files changed, 39 insertions(+), 26 deletions(-) diff --git a/content/docs/user-guide/basic-concepts/experiment.md b/content/docs/user-guide/basic-concepts/experiment.md index 5cd8c44b0e..03f8f4081e 100644 --- a/content/docs/user-guide/basic-concepts/experiment.md +++ b/content/docs/user-guide/basic-concepts/experiment.md @@ -1,11 +1,12 @@ --- name: Experiment -match: [experiment, experiments] +match: [experiment, experiments, 'DVC experiments'] tooltip: >- An attempt to reach desired/better/interesting results during data pipelining or ML model development. DVC is designed to help [manage experiments](/doc/start/experiments), having [built-in mechanisms](/doc/user-guide/experiment-management) like the [run-cache](/doc/user-guide/project-structure/internal-files#run-cache) and - the `dvc experiments` commands (available on DVC 2.0 and above). + the [`dvc experiments`](/doc/command-reference/exp) commands (available on DVC + 2.0 and above). --- diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index aeacb6ea77..103aa8c3dc 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -1,34 +1,46 @@ # Experiment Management -_New in DVC 2.0_ - Data science and ML are iterative processes that require a large number of attempts to reach a certain level of a metric. Experimentation is part of the development of data features, hyperspace exploration, deep learning -optimization, etc. DVC helps you codify and manage all of your -experiments, supporting these main approaches: - -1. Create [experiments](#experiments) that derive from your latest project - version without having to track them manually. DVC does that automatically, - letting you list and compare them. The best ones can be made persistent, and - the rest archived. -2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series - of variations, forming a deep experiment. DVC helps you capture them at - runtime, and manage them in batches. -3. Make experiments or checkpoints [persistent](#persistent-experiments) by - committing them to your repository. Or create these versions - from scratch like typical project changes. - - At this point you may also want to consider the different - [ways to organize](#organization-patterns) experiments in your project (as - Git branches, as folders, etc.). - -DVC also provides specialized features to codify and analyze experiments. +optimization, etc. + +Some of DVC's base features already help you codify and analyze experiments. [Parameters](/doc/command-reference/params) are simple values you can tweak in a -human-readable text file, which cause different behaviors in your code and -models. On the other end, [metrics](/doc/command-reference/metrics) (and +formatted text file; They cause different behaviors in your code and models. On +the other end, [metrics](/doc/command-reference/metrics) (and [plots](/doc/command-reference/plots)) let you define, visualize, and compare -meaningful measures for the experimental results. +quantitative measures of your results. + +## DVC Experiments + +_New in DVC 2.0_ + +The `dvc experiments` features are designed to support these main approaches: + +1. Create [experiments] that derive from your latest project version without + polluting your Git history. DVC tracks them for you, letting you list and + compare them. The best ones can be made persistent, and the rest left as + history or cleared. +1. [Queue] and process series of experiments based on a parameter search or + other modifications to your baseline. +1. Generate [checkpoints] during your code execution to analyze the internal + progress of deep experiments. DVC captures them at runtime, and can manage + them in batches. +1. Make experiments [persistent] by committing them to your + repository history. + +[experiments]: /doc/user-guide/experiment-management/experiments +[queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution +[checkpoints]: /doc/user-guide/experiment-management/checkpoints +[persistent]: + /doc/user-guide/experiment-management/experiments#persistent-experiments > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. + +You may also want to consider the different [ways to organize experiments] in +your project (as Git branches, as folders, etc.). + +[ways to organize experiments]: + /doc/user-guide/experiment-management/organization From e21fef4db601c0bd7ced258f003b19246403ead0 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 29 Jul 2021 06:24:35 +0000 Subject: [PATCH 05/56] guide: edit basic Exps page inc. persisting them and move run-cache to guide intro (index) --- .../experiment-management/experiments.md | 22 +++++-------------- .../user-guide/experiment-management/index.md | 18 +++++++++++++++ 2 files changed, 23 insertions(+), 17 deletions(-) diff --git a/content/docs/user-guide/experiment-management/experiments.md b/content/docs/user-guide/experiment-management/experiments.md index 5bb267f4cd..966bc1a1f9 100644 --- a/content/docs/user-guide/experiment-management/experiments.md +++ b/content/docs/user-guide/experiment-management/experiments.md @@ -11,8 +11,7 @@ roll back to the baseline. The basic workflow goes like this: source code) of committed stages. - Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results are reflected in your workspace, and tracked automatically. -- Use [metrics](/doc/command-reference/metrics) to identify the best - experiment(s). +- Use `dvc metrics` to identify the best experiment(s). - Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat 🔄 - Use `dvc exp apply` to roll back to the best one. @@ -25,21 +24,10 @@ When your experiments are good enough to save or share, you may want to store them persistently as Git commits in your repository. Whether the results were produced with `dvc repro` directly, or after a -`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` -pair in the workspace will codify the experiment as a new project -version. The right outputs (including +`dvc exp` workflow, `dvc.yaml` and `dvc.lock` will define the experiment as a +new project version. The right outputs (including [metrics](/doc/command-reference/metrics)) should also be present, or available via `dvc checkout`. -## Run Cache: Automatic Log of Stage Runs - -Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the -unique signature of each stage run (to `.dvc/cache/runs` by default). If it -never happened before, the stage command(s) are executed normally. Every -subsequent time a [stage](/doc/command-reference/run) runs under the same -conditions, the previous results can be restored instantly, without wasting time -or computing resources. - -✅ This built-in feature is called run-cache and it can -dramatically improve performance. It's enabled out-of-the-box (but can be -disabled with the `--no-run-cache` command option). +Use `dvc exp apply` and `dvc exp branch` to persist experiments in your Git +history. diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 103aa8c3dc..69f0d25c6d 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -12,6 +12,24 @@ the other end, [metrics](/doc/command-reference/metrics) (and [plots](/doc/command-reference/plots)) let you define, visualize, and compare quantitative measures of your results. +
+ +## 💡 Run Cache: Automatic Log of Stage Runs + +Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it +logs the unique signature of each stage run (in `.dvc/cache/runs` by default). +If it never happened before, the stage command(s) are executed normally. Every +subsequent time a [stage](/doc/command-reference/run) runs under the same +conditions, the previous results can be restored instantly, without wasting time +or computing resources. + +✅ This built-in feature is called run-cache and it can +dramatically improve performance. It's enabled out-of-the-box (can be disabled), +which means DVC is already saving all of your tests and experiments behind the +scene. But there's no easy way to explore it. + +
+ ## DVC Experiments _New in DVC 2.0_ From d8f2d7cee68183c832bee9f328b173d1cdaa1c08 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 4 Aug 2021 15:53:44 +0000 Subject: [PATCH 06/56] guide: rename DVC Exps, remove Org Exps page --- content/docs/sidebar.json | 7 +---- .../{experiments.md => dvc-experiments.md} | 2 +- .../user-guide/experiment-management/index.md | 31 ++++++++++++++++--- .../experiment-management/organization.md | 20 ------------ 4 files changed, 29 insertions(+), 31 deletions(-) rename content/docs/user-guide/experiment-management/{experiments.md => dvc-experiments.md} (98%) delete mode 100644 content/docs/user-guide/experiment-management/organization.md diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index c88ba60f07..d4d8825040 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,12 +139,7 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": [ - "experiments", - "sharing-experiments", - "checkpoints", - "organization" - ] + "children": ["dvc-experiments", "sharing-experiments", "checkpoints"] }, "setup-google-drive-remote", "large-dataset-optimization", diff --git a/content/docs/user-guide/experiment-management/experiments.md b/content/docs/user-guide/experiment-management/dvc-experiments.md similarity index 98% rename from content/docs/user-guide/experiment-management/experiments.md rename to content/docs/user-guide/experiment-management/dvc-experiments.md index 966bc1a1f9..cdc71e77c1 100644 --- a/content/docs/user-guide/experiment-management/experiments.md +++ b/content/docs/user-guide/experiment-management/dvc-experiments.md @@ -1,4 +1,4 @@ -## Experiments +## DVC Experiments _New in DVC 2.0_ diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 69f0d25c6d..15a566c1d7 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -57,8 +57,31 @@ The `dvc experiments` features are designed to support these main approaches: > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. -You may also want to consider the different [ways to organize experiments] in -your project (as Git branches, as folders, etc.). +More information in the +[full guide](/doc/user-guide/experiment-management/dvc-experiments). -[ways to organize experiments]: - /doc/user-guide/experiment-management/organization +### Organization Patterns + +It's up to you to decide how to organize completed experiments. These are the +main alternatives: + +- **Git tags and branches** - use the repo's "time dimension" to distribute your + experiments. This makes the most sense for experiments that build on each + other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can + be easily visualized, for example with tools + [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). +- **Directories** - the project's "space dimension" can be structured with + directories (folders) to organize experiments. Useful when you want to see all + your experiments at the same time (without switching versions) by just + exploring the file system. +- **Hybrid** - combining an intuitive directory structure with a good repo + branching strategy tends to be the best option for complex projects. + Completely independent experiments live in separate directories, while their + progress can be found in different branches. + +DVC takes care of arranging `dvc exp` experiments and the data +cache under the hood so there's no need to decide on the above +until your `dvc experiments` are made [persistent]. + +[persistent]: + /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments diff --git a/content/docs/user-guide/experiment-management/organization.md b/content/docs/user-guide/experiment-management/organization.md deleted file mode 100644 index 7481dd718c..0000000000 --- a/content/docs/user-guide/experiment-management/organization.md +++ /dev/null @@ -1,20 +0,0 @@ -### Organization Patterns - -DVC takes care of arranging `dvc exp` experiments and the data -cache under the hood. But when it comes to full-blown persistent -experiments, it's up to you to decide how to organize them in your project. -These are the main alternatives: - -- **Git tags and branches** - use the repo's "time dimension" to distribute your - experiments. This makes the most sense for experiments that build on each - other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can - be easily visualized, for example with tools - [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). -- **Directories** - the project's "space dimension" can be structured with - directories (folders) to organize experiments. Useful when you want to see all - your experiments at the same time (without switching versions) by just - exploring the file system. -- **Hybrid** - combining an intuitive directory structure with a good repo - branching strategy tends to be the best option for complex projects. - Completely independent experiments live in separate directories, while their - progress can be found in different branches. From 1337453578ad98f8cbf5cbb549ef6abbd03ae1c1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 4 Aug 2021 16:04:28 +0000 Subject: [PATCH 07/56] guide: bash -> dvc in EM/Checkpoints --- .../user-guide/experiment-management/checkpoints.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index 1b18ec3f55..0e12f611e0 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -35,7 +35,7 @@ dataset. You can follow along with the steps here or you can clone the repo directly from GitHub and play with it. To clone the repo, run the following commands. -```bash +```dvc $ git clone https://github.com/iterative/checkpoints-tutorial $ cd checkpoints-tutorial ``` @@ -43,7 +43,7 @@ $ cd checkpoints-tutorial It is highly recommended you create a virtual environment for this example. You can do that by running: -```bash +```dvc $ python3 -m venv .venv ``` @@ -56,7 +56,7 @@ following commands. Once you have your environment set up, you can install the dependencies by running: -```bash +```dvc $ pip install -r requirements.txt ``` @@ -133,7 +133,7 @@ stages: Before we go any further, this is a great point to add these changes to your Git history. You can do that with the following commands: -```bash +```dvc $ git add . $ git commit -m "created DVC pipeline" ``` @@ -448,7 +448,7 @@ Untracked files: All that's left to do is to `git commit` the changes: -```bash +```dvc $ git commit -m 'saved files from experiment' ``` From 8d9352197bfe76f7916c3783435c2205f25c4e09 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 4 Aug 2021 16:06:35 +0000 Subject: [PATCH 08/56] guide: fix exps link --- content/docs/user-guide/experiment-management/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 15a566c1d7..f7d49feafb 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -48,11 +48,11 @@ The `dvc experiments` features are designed to support these main approaches: 1. Make experiments [persistent] by committing them to your repository history. -[experiments]: /doc/user-guide/experiment-management/experiments +[experiments]: /doc/user-guide/experiment-management/dvc-experiments [queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution [checkpoints]: /doc/user-guide/experiment-management/checkpoints [persistent]: - /doc/user-guide/experiment-management/experiments#persistent-experiments + /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. From 44a16149f64e8e8d336a7b147937b3d9ead5e148 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Aug 2021 11:14:26 -0500 Subject: [PATCH 09/56] guide: summarize Sharing Exps intro --- .../sharing-experiments.md | 34 +++++++------------ 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index cd6103b019..4de2f181ff 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,37 +1,27 @@ # Sharing Experiments -There are two types of remotes that can store experiments. Git remotes are -distributed copies of the Git repository, for example on GitHub or GitLab. - -[DVC remotes](/doc/command-reference/remote) on the other hand are -storage-specific locations (e.g. Amazon S3 or Google Drive) which we can -configure with `dvc remote`. DVC uses them to store and fetch large files that -don't normally fit inside Git repos. - -DVC needs both kinds of remotes for backing up and sharing experiments. - -Experiment files that are normally tracked in Git (like code versions) are -shared using Git remotes, and files or directories tracked with DVC (like -datasets) are shared using DVC remotes. - -> See [Git remotes guide] and `dvc remote add` for information on setting them -> up. +Two types of _remotes_ are needed to upload experiments for sharing. +[Git remotes](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes) +are distributed copies of the Git repository, hosted for example on GitHub or +GitLab. Small files like experimental code and +[DVC metafiles](/doc/user-guide/project-structure) files will go there. +[DVC remotes](/doc/command-reference/remote) on the other hand are data storage +locations (e.g. Amazon S3 or Google Drive). You can use them to back up and +[share data](/doc/use-cases/sharing-data-and-model-files) files and directories +that don't fit inside Git repos. + +> See this [Git remotes guide] and `dvc remote add` for ifo. on setting them up. [git remotes guide]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -Normally, there should already be a Git remote called `origin` when you clone a -repo. Use `git remote -v` to list your Git remotes: +You can list your remotes with `git remote -v` and `dvc remote list`: ```dvc $ git remote -v origin https://github.com/iterative/get-started-experiments (fetch) origin https://github.com/iterative/get-started-experiments (push) -``` - -Similarly, you can see the DVC remotes in you project using `dvc remote list`: -```dvc $ dvc remote list storage https://remote.dvc.org/get-started-experiments ``` From 3dbcb5b24a6e9f989c1f8ca2143abd09a0eb37f4 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 15 Aug 2021 13:18:31 -0500 Subject: [PATCH 10/56] ref: link from exp push/pull to Exp Sharing guide --- content/docs/command-reference/exp/pull.md | 6 ++++-- content/docs/command-reference/exp/push.md | 7 +++++-- content/docs/command-reference/exp/run.md | 2 +- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/content/docs/command-reference/exp/pull.md b/content/docs/command-reference/exp/pull.md index 32fee2b766..449e52769f 100644 --- a/content/docs/command-reference/exp/pull.md +++ b/content/docs/command-reference/exp/pull.md @@ -17,8 +17,10 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for sharing -experiments across repository copies via Git (and DVC) remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for +[sharing experiments] across repository copies via Git and DVC remotes. + +[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments > Plain `git push` and `git fetch` don't work with `dvc experiments` because > these are saved under custom Git references. See **How does DVC track diff --git a/content/docs/command-reference/exp/push.md b/content/docs/command-reference/exp/push.md index eaa4a793d2..2638c67479 100644 --- a/content/docs/command-reference/exp/push.md +++ b/content/docs/command-reference/exp/push.md @@ -17,8 +17,11 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for sharing -experiments across repository copies via Git (and DVC) remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for +[sharing experiments] across repository copies via Git and DVC +remotes. + +[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments > Plain `git push` and `git fetch` don't work with `dvc experiments` because > these are saved under custom Git references. See **How does DVC track diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 295e647928..885d230083 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -44,7 +44,7 @@ option. Experiments are custom [Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References) (found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked -out by DVC). Note that these commits are not pushed to the Git remote by default +out by DVC). Note that these commits are not pushed to Git remotes by default (see `dvc exp push`).
From 3fff05146812e3466b335b50eec34d6ea8969a8c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 15 Aug 2021 13:22:34 -0500 Subject: [PATCH 11/56] Update content/docs/user-guide/experiment-management/sharing-experiments.md --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 4de2f181ff..bfe616b238 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -10,7 +10,7 @@ locations (e.g. Amazon S3 or Google Drive). You can use them to back up and [share data](/doc/use-cases/sharing-data-and-model-files) files and directories that don't fit inside Git repos. -> See this [Git remotes guide] and `dvc remote add` for ifo. on setting them up. +> See this [Git remotes guide] and `dvc remote add` for info. on setting them up. [git remotes guide]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From c419fe6ca82eda1f39e61fed87a79c1cf5c38377 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 15 Aug 2021 13:40:39 -0500 Subject: [PATCH 12/56] guide: rename Exp Sharing sections --- .../experiment-management/sharing-experiments.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 4de2f181ff..2e495e93ef 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -26,7 +26,7 @@ $ dvc remote list storage https://remote.dvc.org/get-started-experiments ``` -## Uploading experiments to remotes +## Uploading experiments You can upload an experiment and its files to both remotes using `dvc exp push` (requires the Git remote name and experiment name as arguments). @@ -53,7 +53,7 @@ performance also depend on the connection bandwidth and remote configurations. [run-cache]: /doc/user-guide/project-structure/internal-files#run-cache -## Listing experiments remotely +## Listing remote experiments In order to list experiments in a DVC project, you can use the `dvc exp list` command. With no command line options, it lists the experiments in the current @@ -98,7 +98,7 @@ cnn-64 cnn-96 ``` -## Downloading experiments from remotes +## Downloading experiments When you clone a DVC repository, it doesn't fetch any experiments by default. In order to get them, use `dvc exp pull` (with the Git remote and the experiment @@ -120,7 +120,7 @@ can set the number with `--jobs` (`-j`). If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. -### Example: Pushing or pulling multiple experiments +## Example: Sharing multiple experiments You can create a loop to upload or download all experiments like this: @@ -130,8 +130,7 @@ $ dvc exp list --all --names-only | while read -r expname ; do \ done ``` -> Without `--all`, only the experiments derived from the current commit will be -> pushed/pulled. +## Example: Dedicated experiment directories ## Example: Creating a directory for an experiment From c473e515c451bd2e51bfd426796b4f04de688c86 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 15 Aug 2021 13:41:11 -0500 Subject: [PATCH 13/56] guide: summarize Exp Sharing examples --- .../sharing-experiments.md | 38 ++++++++----------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 2e495e93ef..0e0de3dd97 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -122,7 +122,8 @@ overwrite it unless you supply `--force`. ## Example: Sharing multiple experiments -You can create a loop to upload or download all experiments like this: +You can create a loop to push or pull all experiments. For example on Linux +terminal: ```dvc $ dvc exp list --all --names-only | while read -r expname ; do \ @@ -132,18 +133,15 @@ done ## Example: Dedicated experiment directories -## Example: Creating a directory for an experiment - -A good way to isolate experiments is to create a separate home directory for -each one. +A good way to isolate experiments is to create a separate directory outside the +current repository for each one. > Another alternative is to use `dvc exp apply` and `dvc exp branch`, but here > we'll see how to use `dvc exp pull` to copy an experiment. -Suppose there is a DVC repository in `~/my-project` with multiple -experiments. Let's create a copy of experiment `exp-abc12` from there. - -First, clone the repo into another directory: +Suppose there is a DVC repo in `~/my-project` with multiple experiments. Let's +create a copy of experiment `exp-abc12` from it. First, clone the repo into +another directory: ```dvc $ git clone ~/my-project ~/my-experiment @@ -160,29 +158,25 @@ main: ... ``` -If there is no DVC remote in the original repository, you can define its -cache as the clone's `dvc remote`: +If the original repository doesn't have a `dvc remote`, you can define its +cache as the clone's remote storage: ```dvc $ dvc remote add --local --default storage ~/my-project/.dvc/cache ``` -> ⚠️ `--local` is important here, so that the configuration change doesn't get -> to the original repo accidentally. +> ⚠️ `--local` is important here, so that the configuration changes don't +> accidentally get to the original repo. -If there's a DVC remote for the project, assuming the experiments have been -pushed there, you can pull the one in question: +Having a DVC remote (and assuming the experiments have been pushed or cached +there) you can `dvc exp pull` the one in question; You can then can +`dvc exp apply` it and get a workspace that contains all of its +files: ```dvc $ dvc exp pull origin exp-abc12 -``` - -Then we can `dvc apply` this experiment and get a workspace that -contains all of its files: - -```dvc $ dvc exp apply exp-abc12 ``` -Now you have a dedicated directory for your experiment, containing all its +Now you have a separate repo directory for your experiment, containing all its artifacts! From 1da6bd4135c4afa44f6a5adc403241f3758171af Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 15 Aug 2021 13:47:45 -0500 Subject: [PATCH 14/56] guide: link from Exp Mgmt index to Sharing --- content/docs/user-guide/experiment-management/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index f7d49feafb..dfcf9d18ba 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -47,12 +47,14 @@ The `dvc experiments` features are designed to support these main approaches: them in batches. 1. Make experiments [persistent] by committing them to your repository history. +1. Easily [share and isolate] experiments using Git and DVC remotes. [experiments]: /doc/user-guide/experiment-management/dvc-experiments [queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution [checkpoints]: /doc/user-guide/experiment-management/checkpoints [persistent]: /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments + [share and isolate]: /doc/user-guide/experiment-management/sharing-experiments > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. From ad193b9df346e8343d19028d131c22a4b1a221f7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 03:17:59 -0500 Subject: [PATCH 15/56] guide: ~~isolate~~ from link to Exp Sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-730211648 --- content/docs/user-guide/experiment-management/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index dfcf9d18ba..8ad9841637 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -47,7 +47,7 @@ The `dvc experiments` features are designed to support these main approaches: them in batches. 1. Make experiments [persistent] by committing them to your repository history. -1. Easily [share and isolate] experiments using Git and DVC remotes. +1. Easily [share experiments] using Git and DVC remotes. [experiments]: /doc/user-guide/experiment-management/dvc-experiments [queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution From 7463e85c18d714de9eea78431f001f739b345078 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 03:19:07 -0500 Subject: [PATCH 16/56] Update content/docs/user-guide/experiment-management/sharing-experiments.md Co-authored-by: David de la Iglesia Castro --- .../user-guide/experiment-management/sharing-experiments.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index bfe616b238..9ffcec4895 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -19,8 +19,8 @@ You can list your remotes with `git remote -v` and `dvc remote list`: ```dvc $ git remote -v -origin https://github.com/iterative/get-started-experiments (fetch) -origin https://github.com/iterative/get-started-experiments (push) +origin git@github.com:iterative/get-started-experiments.git (fetch) +origin git@github.com:iterative/get-started-experiments.git (push) $ dvc remote list storage https://remote.dvc.org/get-started-experiments From 2744a97ddf7720d021656460c4bdf2173d8b1cbe Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 03:26:32 -0500 Subject: [PATCH 17/56] guide: mention only SSH Git URLs support exp sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-732522579 --- .../experiment-management/sharing-experiments.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 9ffcec4895..d55ffe0798 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -10,7 +10,12 @@ locations (e.g. Amazon S3 or Google Drive). You can use them to back up and [share data](/doc/use-cases/sharing-data-and-model-files) files and directories that don't fit inside Git repos. -> See this [Git remotes guide] and `dvc remote add` for info. on setting them up. +> See this [Git remotes guide] and `dvc remote add` for info. on setting them +> up. +> ⚠️ Note that only [SSH Git URLs] support DVC experiment sharing. + +[ssh git urls]: + https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols [git remotes guide]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From 2e3379968939486bb00ae54e2bab4777b4f8f363 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 03:34:01 -0500 Subject: [PATCH 18/56] guide: update dvc remote example in sharing exps --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index d55ffe0798..5c8885a1b3 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -28,7 +28,7 @@ origin git@github.com:iterative/get-started-experiments.git (fetch) origin git@github.com:iterative/get-started-experiments.git (push) $ dvc remote list -storage https://remote.dvc.org/get-started-experiments +storage s3://mybucket/my-dvc-store ``` ## Uploading experiments to remotes From c60b5fe3bca0aab8901f48845bffd758b9f1b3d4 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 03:42:30 -0500 Subject: [PATCH 19/56] yarn format some files per https://app.circleci.com/pipelines/github/iterative/dvc.org/10086/workflows/9b1bf89f-a432-49f2-9a20-72fe77dd4102/jobs/10145 --- content/docs/command-reference/exp/pull.md | 4 ++-- content/docs/command-reference/exp/push.md | 5 ++--- content/docs/user-guide/experiment-management/index.md | 2 +- .../user-guide/experiment-management/sharing-experiments.md | 1 - 4 files changed, 5 insertions(+), 7 deletions(-) diff --git a/content/docs/command-reference/exp/pull.md b/content/docs/command-reference/exp/pull.md index 449e52769f..f9d3e5aeab 100644 --- a/content/docs/command-reference/exp/pull.md +++ b/content/docs/command-reference/exp/pull.md @@ -17,8 +17,8 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for -[sharing experiments] across repository copies via Git and DVC remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for [sharing +experiments] across repository copies via Git and DVC remotes. [sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments diff --git a/content/docs/command-reference/exp/push.md b/content/docs/command-reference/exp/push.md index 2638c67479..46bcdcb2ea 100644 --- a/content/docs/command-reference/exp/push.md +++ b/content/docs/command-reference/exp/push.md @@ -17,9 +17,8 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for -[sharing experiments] across repository copies via Git and DVC -remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for [sharing +experiments] across repository copies via Git and DVC remotes. [sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 8ad9841637..4a6fe96d80 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -54,7 +54,7 @@ The `dvc experiments` features are designed to support these main approaches: [checkpoints]: /doc/user-guide/experiment-management/checkpoints [persistent]: /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments - [share and isolate]: /doc/user-guide/experiment-management/sharing-experiments +[share and isolate]: /doc/user-guide/experiment-management/sharing-experiments > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 5c8885a1b3..286ac4ad57 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -16,7 +16,6 @@ that don't fit inside Git repos. [ssh git urls]: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols - [git remotes guide]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From d1422b10a66b378d961f0c33627eddc19e464f54 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 03:44:11 -0500 Subject: [PATCH 20/56] guide: consolidate Exp Sharing intro (#2711) * guide: summarize Sharing Exps intro * ref: link from exp push/pull to Exp Sharing guide * Update content/docs/user-guide/experiment-management/sharing-experiments.md * guide: link from Exp Mgmt index to Sharing * guide: ~~isolate~~ from link to Exp Sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-730211648 * Update content/docs/user-guide/experiment-management/sharing-experiments.md Co-authored-by: David de la Iglesia Castro * guide: mention only SSH Git URLs support exp sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-732522579 * guide: update dvc remote example in sharing exps * yarn format some files per https://app.circleci.com/pipelines/github/iterative/dvc.org/10086/workflows/9b1bf89f-a432-49f2-9a20-72fe77dd4102/jobs/10145 Co-authored-by: David de la Iglesia Castro --- content/docs/command-reference/exp/pull.md | 6 ++- content/docs/command-reference/exp/push.md | 6 ++- content/docs/command-reference/exp/run.md | 2 +- .../user-guide/experiment-management/index.md | 2 + .../sharing-experiments.md | 46 ++++++++----------- 5 files changed, 31 insertions(+), 31 deletions(-) diff --git a/content/docs/command-reference/exp/pull.md b/content/docs/command-reference/exp/pull.md index 32fee2b766..f9d3e5aeab 100644 --- a/content/docs/command-reference/exp/pull.md +++ b/content/docs/command-reference/exp/pull.md @@ -17,8 +17,10 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for sharing -experiments across repository copies via Git (and DVC) remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for [sharing +experiments] across repository copies via Git and DVC remotes. + +[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments > Plain `git push` and `git fetch` don't work with `dvc experiments` because > these are saved under custom Git references. See **How does DVC track diff --git a/content/docs/command-reference/exp/push.md b/content/docs/command-reference/exp/push.md index eaa4a793d2..46bcdcb2ea 100644 --- a/content/docs/command-reference/exp/push.md +++ b/content/docs/command-reference/exp/push.md @@ -17,8 +17,10 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for sharing -experiments across repository copies via Git (and DVC) remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for [sharing +experiments] across repository copies via Git and DVC remotes. + +[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments > Plain `git push` and `git fetch` don't work with `dvc experiments` because > these are saved under custom Git references. See **How does DVC track diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 295e647928..885d230083 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -44,7 +44,7 @@ option. Experiments are custom [Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References) (found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked -out by DVC). Note that these commits are not pushed to the Git remote by default +out by DVC). Note that these commits are not pushed to Git remotes by default (see `dvc exp push`). diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index f7d49feafb..4a6fe96d80 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -47,12 +47,14 @@ The `dvc experiments` features are designed to support these main approaches: them in batches. 1. Make experiments [persistent] by committing them to your repository history. +1. Easily [share experiments] using Git and DVC remotes. [experiments]: /doc/user-guide/experiment-management/dvc-experiments [queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution [checkpoints]: /doc/user-guide/experiment-management/checkpoints [persistent]: /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments +[share and isolate]: /doc/user-guide/experiment-management/sharing-experiments > 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index cd6103b019..286ac4ad57 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,39 +1,33 @@ # Sharing Experiments -There are two types of remotes that can store experiments. Git remotes are -distributed copies of the Git repository, for example on GitHub or GitLab. - -[DVC remotes](/doc/command-reference/remote) on the other hand are -storage-specific locations (e.g. Amazon S3 or Google Drive) which we can -configure with `dvc remote`. DVC uses them to store and fetch large files that -don't normally fit inside Git repos. - -DVC needs both kinds of remotes for backing up and sharing experiments. - -Experiment files that are normally tracked in Git (like code versions) are -shared using Git remotes, and files or directories tracked with DVC (like -datasets) are shared using DVC remotes. - -> See [Git remotes guide] and `dvc remote add` for information on setting them -> up. - +Two types of _remotes_ are needed to upload experiments for sharing. +[Git remotes](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes) +are distributed copies of the Git repository, hosted for example on GitHub or +GitLab. Small files like experimental code and +[DVC metafiles](/doc/user-guide/project-structure) files will go there. +[DVC remotes](/doc/command-reference/remote) on the other hand are data storage +locations (e.g. Amazon S3 or Google Drive). You can use them to back up and +[share data](/doc/use-cases/sharing-data-and-model-files) files and directories +that don't fit inside Git repos. + +> See this [Git remotes guide] and `dvc remote add` for info. on setting them +> up. +> ⚠️ Note that only [SSH Git URLs] support DVC experiment sharing. + +[ssh git urls]: + https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols [git remotes guide]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -Normally, there should already be a Git remote called `origin` when you clone a -repo. Use `git remote -v` to list your Git remotes: +You can list your remotes with `git remote -v` and `dvc remote list`: ```dvc $ git remote -v -origin https://github.com/iterative/get-started-experiments (fetch) -origin https://github.com/iterative/get-started-experiments (push) -``` +origin git@github.com:iterative/get-started-experiments.git (fetch) +origin git@github.com:iterative/get-started-experiments.git (push) -Similarly, you can see the DVC remotes in you project using `dvc remote list`: - -```dvc $ dvc remote list -storage https://remote.dvc.org/get-started-experiments +storage s3://mybucket/my-dvc-store ``` ## Uploading experiments to remotes From 8b6a3f6f7ecfbf8d90a670612b23d2eb3a89620a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 18 Aug 2021 06:33:20 -0500 Subject: [PATCH 21/56] prettier sharing-experiments.md --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 1d981b18ae..90e791ef61 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -138,7 +138,7 @@ done ## Example: Dedicated experiment directories A good way to isolate experiments is to create a separate directory outside the -current repository for each one. +current repository for each one. > Another alternative is to use `dvc exp apply` and `dvc exp branch`, but here > we'll see how to use `dvc exp pull` to copy an experiment. From 209e848e892b17ba81b27249bfc99e5502b21f92 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 20 Aug 2021 16:15:04 -0500 Subject: [PATCH 22/56] Update content/docs/user-guide/experiment-management/sharing-experiments.md Co-authored-by: Casper da Costa-Luis --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 90e791ef61..d227bc0ef2 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -126,7 +126,7 @@ overwrite it unless you supply `--force`. ## Example: Sharing multiple experiments -You can create a loop to push or pull all experiments. For example on Linux +You can create a loop to push or pull all experiments. For example in a Linux terminal: ```dvc From 123767437526996732a6c8b1d8e6167426ba4740 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 8 Oct 2021 15:45:38 -0400 Subject: [PATCH 23/56] guide: roll back wrong files --- content/docs/sidebar.json | 1 - .../user-guide/basic-concepts/experiment.md | 2 +- .../experiment-management/checkpoints.md | 80 +++++++++++-------- .../experiment-management/dvc-experiments.md | 33 -------- 4 files changed, 47 insertions(+), 69 deletions(-) delete mode 100644 content/docs/user-guide/experiment-management/dvc-experiments.md diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 2d66666e9f..8e21510e77 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -147,7 +147,6 @@ "slug": "experiment-management", "source": "experiment-management/index.md", "children": [ - "dvc-experiments", "running-experiments", "sharing-experiments", "cleaning-experiments", diff --git a/content/docs/user-guide/basic-concepts/experiment.md b/content/docs/user-guide/basic-concepts/experiment.md index e0888ddf30..5f1228cc49 100644 --- a/content/docs/user-guide/basic-concepts/experiment.md +++ b/content/docs/user-guide/basic-concepts/experiment.md @@ -1,6 +1,6 @@ --- name: Experiment -match: [experiment, experiments, 'DVC experiments'] +match: [experiment, experiments] tooltip: >- An attempt to reach desired/better/interesting results during data pipelining or ML model development. DVC is designed to help [manage diff --git a/content/docs/user-guide/experiment-management/checkpoints.md b/content/docs/user-guide/experiment-management/checkpoints.md index f67a76f567..00396aaaf0 100644 --- a/content/docs/user-guide/experiment-management/checkpoints.md +++ b/content/docs/user-guide/experiment-management/checkpoints.md @@ -1,29 +1,27 @@ # Checkpoints -_New in DVC 2.0_ +ML checkpoints are an important part of deep learning because ML engineers like +to save the model files at certain points during a training process. With checkpoint experiments, you can: -_Checkpoint experiments_ track a series of variations (the checkpoints) and -their execution can be stopped and resumed as needed. You interact with them -using the `--rev` and `--reset` options of `dvc exp run` (see also the -`checkpoint` field in `dvc.yaml` `outs`). They can help you - -- implement the best practice in deep learning to save your model weights as +- Implement the best practice in deep learning to save your model weights as checkpoints. -- track all code and data changes corresponding to the checkpoints. -- see when metrics start diverging and revert to the optimal checkpoint. -- automate the process of tracking every training epoch. +- Track all code and data changes corresponding to the checkpoints. +- See when metrics start diverging and revert to the optimal checkpoint. +- Automate the process of tracking every training epoch. -> Experiments and checkpoints are [implemented](/blog/experiment-refs) with -> hidden Git experiment commits branches. +[The way checkpoints are implemented by DVC](/blog/experiment-refs) utilizes +_ephemeral_ experiment commits and experiment branches within DVC. They are +created using the metadata from experiments and are tracked with the `exps` +custom Git reference. -Like with regular experiments, checkpoints can become persistent by -[committing them to Git](#committing-checkpoints-to-git). +You can add experiments to your Git history by committing the experiment you +want to track, which you'll see later in this tutorial. -This guide covers how to implement checkpoints in an ML project using DVC. We're -going to train a model to identify handwritten digits based on the MNIST -dataset. +This tutorial is going to cover how to implement checkpoints in an ML project +using DVC. We're going to train a model to identify handwritten digits based on +the MNIST dataset. https://youtu.be/PcDo-hCvYpw @@ -34,7 +32,7 @@ https://youtu.be/PcDo-hCvYpw You can follow along with the steps here or you can clone the repo directly from GitHub and play with it. To clone the repo, run the following commands. -```dvc +```bash $ git clone https://github.com/iterative/checkpoints-tutorial $ cd checkpoints-tutorial ``` @@ -42,7 +40,7 @@ $ cd checkpoints-tutorial It is highly recommended you create a virtual environment for this example. You can do that by running: -```dvc +```bash $ python3 -m venv .venv ``` @@ -55,7 +53,7 @@ following commands. Once you have your environment set up, you can install the dependencies by running: -```dvc +```bash $ pip install -r requirements.txt ``` @@ -66,9 +64,9 @@ everything you need to get started with experiments and checkpoints. ## Setting up a DVC pipeline -DVC versions data and it also can version the ML model weights file as -checkpoints during the training process. To enable this, you will need to set up -a DVC pipeline to train your model. +DVC versions data and it also can version the machine learning model weights +file as checkpoints during the training process. To enable this, you will need +to set up a DVC pipeline to train your model. Adding a DVC pipeline only takes a few commands. At the root of the project, run: @@ -132,7 +130,7 @@ stages: Before we go any further, this is a great point to add these changes to your Git history. You can do that with the following commands: -```dvc +```bash $ git add . $ git commit -m "created DVC pipeline" ``` @@ -429,25 +427,39 @@ new set of checkpoints under a new experiment branch. └─────────────────────────┴──────────┴──────┴─────────┴────────┴────────┴────────┴──────────────┘ ``` -## Committing checkpoints to Git +## Adding checkpoints to Git When you terminate training, you'll see a few commands in the terminal that will -allow you to add these changes to Git, making them persistent: +allow you to add these changes to Git. -```dvc +``` To track the changes with git, run: git add dvclive.json dvc.yaml .gitignore train.py dvc.lock -... +Reproduced experiment(s): exp-263da +Experiment results have been applied to your workspace. + +To promote an experiment to a Git branch run: + + dvc exp branch ``` -Running the command above will stage the checkpoint experiment with Git. You can -take a look at what would be committed first with `git status`. You should see -something similar to this in your terminal: +You can run the following command to save your experiments to the Git history. -```dvc +```bash +$ git add dvclive.json dvc.yaml .gitignore train.py dvc.lock +``` + +You can take a look at what will be committed to your Git history by running: + +```bash $ git status +``` + +You should see something similar to this in your terminal. + +``` Changes to be committed: (use "git restore --staged ..." to unstage) new file: .gitignore @@ -464,9 +476,9 @@ Untracked files: predictions.json ``` -All that's left to do is to `git commit` the changes: +All that's left is to commit these changes with the following command: -```dvc +```bash $ git commit -m 'saved files from experiment' ``` diff --git a/content/docs/user-guide/experiment-management/dvc-experiments.md b/content/docs/user-guide/experiment-management/dvc-experiments.md deleted file mode 100644 index cdc71e77c1..0000000000 --- a/content/docs/user-guide/experiment-management/dvc-experiments.md +++ /dev/null @@ -1,33 +0,0 @@ -## DVC Experiments - -_New in DVC 2.0_ - -`dvc exp` commands let you automatically track a variation to an established -[data pipeline](/doc/command-reference/dag). You can create multiple isolated -experiments this way, as well as review, compare, and restore them later, or -roll back to the baseline. The basic workflow goes like this: - -- Modify stage parameters or other dependencies (e.g. input data, - source code) of committed stages. -- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results - are reflected in your workspace, and tracked automatically. -- Use `dvc metrics` to identify the best experiment(s). -- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat - 🔄 -- Use `dvc exp apply` to roll back to the best one. -- Make the selected experiment persistent by committing its results to Git. This - cleans the slate so you can repeat the process. - -## Persistent Experiments - -When your experiments are good enough to save or share, you may want to store -them persistently as Git commits in your repository. - -Whether the results were produced with `dvc repro` directly, or after a -`dvc exp` workflow, `dvc.yaml` and `dvc.lock` will define the experiment as a -new project version. The right outputs (including -[metrics](/doc/command-reference/metrics)) should also be present, or available -via `dvc checkout`. - -Use `dvc exp apply` and `dvc exp branch` to persist experiments in your Git -history. From ecbc8cb154f1d303396affef1e0285c7c01e7cb5 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 8 Oct 2021 16:06:02 -0400 Subject: [PATCH 24/56] guide: roll back Exp Mgmt index... --- .../user-guide/experiment-management/index.md | 132 +++++++++++------- 1 file changed, 78 insertions(+), 54 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 4a6fe96d80..7841b5a8ab 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -1,71 +1,89 @@ # Experiment Management +_New in DVC 2.0_ + Data science and ML are iterative processes that require a large number of attempts to reach a certain level of a metric. Experimentation is part of the development of data features, hyperspace exploration, deep learning -optimization, etc. - -Some of DVC's base features already help you codify and analyze experiments. +optimization, etc. DVC helps you codify and manage all of your +experiments, supporting these main approaches: + +1. Create [experiments](#experiments) that derive from your latest project + version without having to track them manually. DVC does that automatically, + letting you list and compare them. The best ones can be made persistent, and + the rest archived. +2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series + of variations, forming a deep experiment. DVC helps you capture them at + runtime, and manage them in batches. +3. Make experiments or checkpoints [persistent](#persistent-experiments) by + committing them to your repository. Or create these versions + from scratch like typical project changes. + + At this point you may also want to consider the different + [ways to organize](#organization-patterns) experiments in your project (as + Git branches, as folders, etc.). + +DVC also provides specialized features to codify and analyze experiments. [Parameters](/doc/command-reference/params) are simple values you can tweak in a -formatted text file; They cause different behaviors in your code and models. On -the other end, [metrics](/doc/command-reference/metrics) (and +human-readable text file, which cause different behaviors in your code and +models. On the other end, [metrics](/doc/command-reference/metrics) (and [plots](/doc/command-reference/plots)) let you define, visualize, and compare -quantitative measures of your results. +meaningful measures for the experimental results. -
+> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on +> introduction to DVC experiments. -## 💡 Run Cache: Automatic Log of Stage Runs +## Experiments -Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it -logs the unique signature of each stage run (in `.dvc/cache/runs` by default). -If it never happened before, the stage command(s) are executed normally. Every -subsequent time a [stage](/doc/command-reference/run) runs under the same -conditions, the previous results can be restored instantly, without wasting time -or computing resources. +`dvc exp` commands let you automatically track a variation to an established +[data pipeline](/doc/command-reference/dag). You can create multiple isolated +experiments this way, as well as review, compare, and restore them later, or +roll back to the baseline. The basic workflow goes like this: -✅ This built-in feature is called run-cache and it can -dramatically improve performance. It's enabled out-of-the-box (can be disabled), -which means DVC is already saving all of your tests and experiments behind the -scene. But there's no easy way to explore it. +- Modify stage parameters or other dependencies (e.g. input data, + source code) of committed stages. +- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results + are reflected in your workspace, and tracked automatically. +- Use [metrics](/doc/command-reference/metrics) to identify the best + experiment(s). +- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat + 🔄 +- Use `dvc exp apply` to roll back to the best one. +- Make the selected experiment persistent by committing its results to Git. This + cleans the slate so you can repeat the process. -
+## Checkpoints in source code -## DVC Experiments +To track successive steps in a longer experiment, you can register checkpoints +from your code at runtime. This allows you, for example, to track the progress +in deep learning techniques such as evolving neural networks. -_New in DVC 2.0_ +This kind of experiments track a series of variations (the checkpoints) and its +execution can be stopped and resumed as needed. You interact with them using +`dvc exp run` and its `--rev`, `--reset` options (see also the `checkpoint` +field in `dvc.yaml` `outs`). -The `dvc experiments` features are designed to support these main approaches: - -1. Create [experiments] that derive from your latest project version without - polluting your Git history. DVC tracks them for you, letting you list and - compare them. The best ones can be made persistent, and the rest left as - history or cleared. -1. [Queue] and process series of experiments based on a parameter search or - other modifications to your baseline. -1. Generate [checkpoints] during your code execution to analyze the internal - progress of deep experiments. DVC captures them at runtime, and can manage - them in batches. -1. Make experiments [persistent] by committing them to your - repository history. -1. Easily [share experiments] using Git and DVC remotes. - -[experiments]: /doc/user-guide/experiment-management/dvc-experiments -[queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution -[checkpoints]: /doc/user-guide/experiment-management/checkpoints -[persistent]: - /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments -[share and isolate]: /doc/user-guide/experiment-management/sharing-experiments +> 📖 To learn more, see the dedicated +> [Checkpoints](/doc/user-guide/experiment-management/checkpoints) guide. -> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on -> introduction to DVC experiments. +## Persistent experiments -More information in the -[full guide](/doc/user-guide/experiment-management/dvc-experiments). +When your experiments are good enough to save or share, you may want to store +them persistently as Git commits in your repository. -### Organization Patterns +Whether the results were produced with `dvc repro` directly, or after a +`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` +pair in the workspace will codify the experiment as a new project +version. The right outputs (including +[metrics](/doc/command-reference/metrics)) should also be present, or available +via `dvc checkout`. -It's up to you to decide how to organize completed experiments. These are the -main alternatives: +### Organization patterns + +DVC takes care of arranging `dvc exp` experiments and the data +cache under the hood. But when it comes to full-blown persistent +experiments, it's up to you to decide how to organize them in your project. +These are the main alternatives: - **Git tags and branches** - use the repo's "time dimension" to distribute your experiments. This makes the most sense for experiments that build on each @@ -81,9 +99,15 @@ main alternatives: Completely independent experiments live in separate directories, while their progress can be found in different branches. -DVC takes care of arranging `dvc exp` experiments and the data -cache under the hood so there's no need to decide on the above -until your `dvc experiments` are made [persistent]. +## Automatic log of stage runs (run-cache) -[persistent]: - /doc/user-guide/experiment-management/dvc-experiments#persistent-experiments +Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the +unique signature of each stage run (to `.dvc/cache/runs` by default). If it +never happened before, the stage command(s) are executed normally. Every +subsequent time a [stage](/doc/command-reference/run) runs under the same +conditions, the previous results can be restored instantly, without wasting time +or computing resources. + +✅ This built-in feature is called run-cache and it can +dramatically improve performance. It's enabled out-of-the-box (but can be +disabled with the `--no-run-cache` command option). From 5f1f8a84ee060e731f55a8e7664986e90179427a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 8 Oct 2021 16:06:44 -0400 Subject: [PATCH 25/56] guide: link to Sharing Exps from index --- content/docs/user-guide/experiment-management/index.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 7841b5a8ab..e9896c985a 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -12,9 +12,11 @@ optimization, etc. DVC helps you codify and manage all of your version without having to track them manually. DVC does that automatically, letting you list and compare them. The best ones can be made persistent, and the rest archived. + 2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series of variations, forming a deep experiment. DVC helps you capture them at runtime, and manage them in batches. + 3. Make experiments or checkpoints [persistent](#persistent-experiments) by committing them to your repository. Or create these versions from scratch like typical project changes. @@ -23,6 +25,10 @@ optimization, etc. DVC helps you codify and manage all of your [ways to organize](#organization-patterns) experiments in your project (as Git branches, as folders, etc.). +4. Easily [share experiments] using Git and DVC remotes. + +[share experiments]: /doc/user-guide/experiment-management/sharing-experiments + DVC also provides specialized features to codify and analyze experiments. [Parameters](/doc/command-reference/params) are simple values you can tweak in a human-readable text file, which cause different behaviors in your code and From fdd38e2b60aacec8569188f2ed4e866b38cc65a1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 8 Oct 2021 20:53:17 -0400 Subject: [PATCH 26/56] guide: Listing exps on remotes per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775502448 --- .../sharing-experiments.md | 21 ++++--------------- 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index d227bc0ef2..c8bf3f9428 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -57,13 +57,11 @@ performance also depend on the connection bandwidth and remote configurations. [run-cache]: /doc/user-guide/project-structure/internal-files#run-cache -## Listing remote experiments +## Listing experiments on remotes -In order to list experiments in a DVC project, you can use the `dvc exp list` -command. With no command line options, it lists the experiments in the current -project. - -You can supply a Git remote name to list the experiments: +You can use the `dvc exp list` command to list experiments. (with no arguments +it lists the experiments in the current project. You can supply a Git remote +name to list the experiments that have been pushed there: ```dvc $ dvc exp list origin @@ -91,17 +89,6 @@ main: ... ``` -When you don't need to see the parent commits, you can list experiment names -only, with `--names-only`: - -```dvc -$ dvc exp list origin --names-only -cnn-128 -cnn-32 -cnn-64 -cnn-96 -``` - ## Downloading experiments When you clone a DVC repository, it doesn't fetch any experiments by default. In From 1c46961817f76048a9b340523135891593822f82 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 8 Oct 2021 21:00:52 -0400 Subject: [PATCH 27/56] guide: don't mention Git here... per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775506755 --- content/docs/user-guide/experiment-management/index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index e9896c985a..830a4399dd 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -21,11 +21,11 @@ optimization, etc. DVC helps you codify and manage all of your committing them to your repository. Or create these versions from scratch like typical project changes. - At this point you may also want to consider the different - [ways to organize](#organization-patterns) experiments in your project (as - Git branches, as folders, etc.). + > At this point you may also want to consider the different + > [ways to organize](#organization-patterns) experiments in your project as + > well. -4. Easily [share experiments] using Git and DVC remotes. +4. Easily [share experiments] across your team. [share experiments]: /doc/user-guide/experiment-management/sharing-experiments From 0f1b7efb4e21367c910815f93e8235ecc4d71356 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 9 Oct 2021 19:41:38 -0400 Subject: [PATCH 28/56] guide: clarify that git is needed for exps and sharing per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775506990 --- .../user-guide/experiment-management/index.md | 3 ++ .../sharing-experiments.md | 32 +++++++++---------- 2 files changed, 19 insertions(+), 16 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 830a4399dd..dfb3cada24 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -41,6 +41,9 @@ meaningful measures for the experimental results. ## Experiments +> ⚠️ Note: these features require the project a Git repository. You do not need +> to use Git yourself, however. + `dvc exp` commands let you automatically track a variation to an established [data pipeline](/doc/command-reference/dag). You can create multiple isolated experiments this way, as well as review, compare, and restore them later, or diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index c8bf3f9428..b47444aa98 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,25 +1,25 @@ # Sharing Experiments -Two types of _remotes_ are needed to upload experiments for sharing. -[Git remotes](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes) -are distributed copies of the Git repository, hosted for example on GitHub or -GitLab. Small files like experimental code and -[DVC metafiles](/doc/user-guide/project-structure) files will go there. -[DVC remotes](/doc/command-reference/remote) on the other hand are data storage -locations (e.g. Amazon S3 or Google Drive). You can use them to back up and -[share data](/doc/use-cases/sharing-data-and-model-files) files and directories -that don't fit inside Git repos. - -> See this [Git remotes guide] and `dvc remote add` for info. on setting them -> up. -> ⚠️ Note that only [SSH Git URLs] support DVC experiment sharing. +> ⚠️ Note: Since a Git repository is required for experiment management +> features, in order to share them you will also need to have a [Git remote] +> setup. Only [SSH Git URLs] are supported. +[git remote]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes [ssh git urls]: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols -[git remotes guide]: - https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -You can list your remotes with `git remote -v` and `dvc remote list`: +Sharing experiments is similar to [sharing regular project data] and artifacts +by pushing (uploading) and pulling (downloading) from remotes. DVC-tracked data, +models, etc. are in your project's cache and thus will be +transferred to/from [DVC remote storage](/doc/command-reference/remote) (e.g. +Amazon S3 or Google Drive). Small files like experimental code and +[DVC metafiles](/doc/user-guide/project-structure) files are stored and shared +with Git automatically (you don't need to worry about using Git directly). + +[sharing regular project data]: /doc/use-cases/sharing-data-and-model-files + +Start by making sure you have your remotes setup with `git remote -v` and +`dvc remote list`: ```dvc $ git remote -v From dcd69861fa776b019f14613a4a61f7708c4a2b2e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 11 Oct 2021 15:41:35 -0400 Subject: [PATCH 29/56] guide: clarify note on Git requirement for DVC Exps per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775743738 --- content/docs/user-guide/experiment-management/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index dfb3cada24..d4d27d6deb 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -41,8 +41,8 @@ meaningful measures for the experimental results. ## Experiments -> ⚠️ Note: these features require the project a Git repository. You do not need -> to use Git yourself, however. +> Note: these features require a Git repository. Advanced Git operations are +> handled automatically by DVC. `dvc exp` commands let you automatically track a variation to an established [data pipeline](/doc/command-reference/dag). You can create multiple isolated From 5e8dc5ee9d535ac3706fa41b67c133f4520da066 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 11 Oct 2021 16:09:32 -0400 Subject: [PATCH 30/56] guide: simplify Sharing Exps intro (rel Git) per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775746073 --- .../sharing-experiments.md | 54 ++++++++++--------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index b47444aa98..3cda0c8437 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,24 +1,27 @@ # Sharing Experiments -> ⚠️ Note: Since a Git repository is required for experiment management -> features, in order to share them you will also need to have a [Git remote] -> setup. Only [SSH Git URLs] are supported. +If your team shares [Git remotes] on a Git server or hosting (e.g. GitHub, +GitLab, etc.) to collaborate on projects, then you can also use it to save and +share DVC Experiments. You will also need DVC +[remote storage](/doc/command-reference/remote) setup. -[git remote]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -[ssh git urls]: - https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols +[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes + +
+ +## ⚙️ Expand to learn more. Sharing experiments is similar to [sharing regular project data] and artifacts -by pushing (uploading) and pulling (downloading) from remotes. DVC-tracked data, -models, etc. are in your project's cache and thus will be -transferred to/from [DVC remote storage](/doc/command-reference/remote) (e.g. -Amazon S3 or Google Drive). Small files like experimental code and -[DVC metafiles](/doc/user-guide/project-structure) files are stored and shared -with Git automatically (you don't need to worry about using Git directly). +by synchronizing from remotes. DVC-tracked data, models, etc. are in your +project's cache and thus will be transferred to/from +[remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google +Drive). Small files like experimental code and +[DVC metafiles](/doc/user-guide/project-structure) files are uploaded or +downloaded with Git automatically as needed. [sharing regular project data]: /doc/use-cases/sharing-data-and-model-files -Start by making sure you have your remotes setup with `git remote -v` and +You can check you have all the necessary remotes setup with `git remote -v` and `dvc remote list`: ```dvc @@ -30,24 +33,27 @@ $ dvc remote list storage s3://mybucket/my-dvc-store ``` +
+ +> ⚠️ Note that only [SSH Git URLs] are compatible with DVC Experiment sharing. + +[ssh git urls]: + https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols + ## Uploading experiments -You can upload an experiment and its files to both remotes using `dvc exp push` -(requires the Git remote name and experiment name as arguments). +You can upload an experiment with all of it's files and data using +`dvc exp push` (requires a Git remote name and experiment name as arguments). + +> 💡 You can use `dvc exp show` to find experiment names. ```dvc $ dvc exp push origin exp-abc123 ``` -> Use `dvc exp show` to find experiment names. - -This pushes the necessary DVC-tracked files from the cache to the default DVC -remote (similar to `dvc push`). You can prevent this behavior by using the -`--no-cache` option to the command above. - -If there's no default DVC remote, it will ask you to define one with -`dvc remote default`. If you don't want a default remote, or if you want to use -a different remote, you can specify one with the `--remote` (`-r`) option. +The [default DVC remote](/doc/command-reference/remote/default) is used unless +one is specified with the `--remote` (`-r`) option. To prevent pushing +DVC-tracked files to remote storage altogether, use the `--no-cache` option. DVC can use multiple threads to upload files (4 per CPU core by default). You can set the number with `--jobs` (`-j`). Please note that increases in From 5b3306af0544d8d2abb875ceb37a58e19aba50b9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 11 Oct 2021 16:14:41 -0400 Subject: [PATCH 31/56] guide: rename exp list -r section --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 3cda0c8437..e639021a41 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -63,7 +63,7 @@ performance also depend on the connection bandwidth and remote configurations. [run-cache]: /doc/user-guide/project-structure/internal-files#run-cache -## Listing experiments on remotes +## Listing experiments saved on remotes You can use the `dvc exp list` command to list experiments. (with no arguments it lists the experiments in the current project. You can supply a Git remote From 003d38a5c9ad4a57ecd4ea1b9850373c3819a970 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 16 Oct 2021 21:48:49 -0400 Subject: [PATCH 32/56] copy edit --- .../user-guide/experiment-management/sharing-experiments.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index e639021a41..4d5503be52 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,8 +1,8 @@ # Sharing Experiments -If your team shares [Git remotes] on a Git server or hosting (e.g. GitHub, -GitLab, etc.) to collaborate on projects, then you can also use it to save and -share DVC Experiments. You will also need DVC +If your team uses [Git remotes] on a Git server or hosting (e.g. GitHub, GitLab, +etc.) to collaborate on projects, then you can also use them to save and share +DVC Experiments remotely. You will also need DVC [remote storage](/doc/command-reference/remote) setup. [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From 69476edf5094bc0cbb670c6ef2a43a4f3d7e670d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 27 Oct 2021 00:11:58 -0500 Subject: [PATCH 33/56] cases: simplify note about requiring Git per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775743738 --- content/docs/user-guide/experiment-management/index.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index d4d27d6deb..93c3bbe617 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -41,8 +41,7 @@ meaningful measures for the experimental results. ## Experiments -> Note: these features require a Git repository. Advanced Git operations are -> handled automatically by DVC. +> Note: these features require a Git repository. `dvc exp` commands let you automatically track a variation to an established [data pipeline](/doc/command-reference/dag). You can create multiple isolated From f7c94a6c9ade714c5728009c35a147002cf78f8d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 27 Oct 2021 00:41:13 -0500 Subject: [PATCH 34/56] guide: emoji for example in Sharing Exps per https://github.com/iterative/dvc.org/pull/2908#discussion_r737121788 --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 4d5503be52..b3fcd16a5c 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -117,7 +117,7 @@ can set the number with `--jobs` (`-j`). If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. -## Example: Sharing multiple experiments +## 👨‍🏫 Example: Sharing multiple experiments You can create a loop to push or pull all experiments. For example in a Linux terminal: From ad1f508651cff327004bd27f2f42f6531e0e8166 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 27 Oct 2021 00:44:43 -0500 Subject: [PATCH 35/56] guide: clarify note about Git-DVC repo required for Exps per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-776668543 --- content/docs/user-guide/experiment-management/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 93c3bbe617..e8d489df9e 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -41,7 +41,7 @@ meaningful measures for the experimental results. ## Experiments -> Note: these features require a Git repository. +> Note: these features require a Git-enabled DVC repository. `dvc exp` commands let you automatically track a variation to an established [data pipeline](/doc/command-reference/dag). You can create multiple isolated From 3bff416121a58e909d55656309828af3847e64df Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 27 Oct 2021 00:47:02 -0500 Subject: [PATCH 36/56] Update content/docs/user-guide/experiment-management/sharing-experiments.md --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index b3fcd16a5c..6db7be7eb2 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -42,7 +42,7 @@ storage s3://mybucket/my-dvc-store ## Uploading experiments -You can upload an experiment with all of it's files and data using +You can upload an experiment with all of its files and data using `dvc exp push` (requires a Git remote name and experiment name as arguments). > 💡 You can use `dvc exp show` to find experiment names. From bf6a58bff0071291f7f51cb4768b113b85b02822 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 27 Oct 2021 00:51:14 -0500 Subject: [PATCH 37/56] guide: another example emoji en Sharing Exps --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index b3fcd16a5c..61527754da 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -128,7 +128,7 @@ $ dvc exp list --all --names-only | while read -r expname ; do \ done ``` -## Example: Dedicated experiment directories +## 👨‍🏫 Example: Dedicated experiment directories A good way to isolate experiments is to create a separate directory outside the current repository for each one. From 9bb3a2a0ab4bd87de5dee79567c333aae844e094 Mon Sep 17 00:00:00 2001 From: "restyled-io[bot]" <32688539+restyled-io[bot]@users.noreply.github.com> Date: Wed, 27 Oct 2021 00:52:37 -0500 Subject: [PATCH 38/56] Restyled by prettier (#2972) Co-authored-by: Restyled.io --- .../user-guide/experiment-management/sharing-experiments.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 6db7be7eb2..16554977c4 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -42,8 +42,8 @@ storage s3://mybucket/my-dvc-store ## Uploading experiments -You can upload an experiment with all of its files and data using -`dvc exp push` (requires a Git remote name and experiment name as arguments). +You can upload an experiment with all of its files and data using `dvc exp push` +(requires a Git remote name and experiment name as arguments). > 💡 You can use `dvc exp show` to find experiment names. From 32ccd25bda763e13ba8da658ac9b099ad0c5979e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 4 Nov 2021 12:55:50 -0600 Subject: [PATCH 39/56] guide: list exps in Comparing guide, linked from Sharing per https://github.com/iterative/dvc.org/pull/2908#discussion_r725406368 --- .../comparing-experiments.md | 20 +++++----- .../sharing-experiments.md | 38 +++---------------- 2 files changed, 15 insertions(+), 43 deletions(-) diff --git a/content/docs/user-guide/experiment-management/comparing-experiments.md b/content/docs/user-guide/experiment-management/comparing-experiments.md index 93f4e6af1d..eefcac5e08 100644 --- a/content/docs/user-guide/experiment-management/comparing-experiments.md +++ b/content/docs/user-guide/experiment-management/comparing-experiments.md @@ -6,7 +6,7 @@ how they can help you streamline the experimentation process. ## List experiments in the workspace -You can get a list of existing experiments in the repository with +You can get a list of existing experiments in the repository with `dvc exp list`. Without any options, this command lists the experiments based on the latest commit of the current branch (Git `HEAD`). @@ -17,8 +17,8 @@ refs/tags/baseline-experiment: cnn-128 ``` -If you want to list all the experiments in the repository regardless of their -parent commit, use the `--all` flag. +If you want to list all the experiments in the repo regardless of their parent +commit, use the `--all` flag. ```dvc $ dvc exp list --all @@ -29,11 +29,11 @@ main: exp-93150 ``` -## List experiments in another Git remote +## List experiments saved remotely -As we discuss in [Sharing Experiments], you can use `dvc exp push` to upload -experiments to Git remotes. `dvc exp list` can be used to list the experiments -in a Git remote. +Experiments can be [shared] (with `dvc exp push`) from another location. To +review experiments uploaded to a remote repository (which you may +not have locally), provide a Git remote name to `dvc exp list`. ```dvc $ dvc exp list origin @@ -42,10 +42,10 @@ refs/tags/baseline-experiment: cnn-64 ``` -This command lists the experiments originated from `HEAD`. You can add any other -options to the remote command, including `--all`. (see previous section). +This command lists remote experiments originated from `HEAD`. You can add any +other options to the remote command, including `--all` (see previous section). -[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments +[shared]: /doc/user-guide/experiment-management/sharing-experiments ## List experiment names to use in scripts diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index f4311ce4e6..d23320ae74 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -59,42 +59,14 @@ DVC can use multiple threads to upload files (4 per CPU core by default). You can set the number with `--jobs` (`-j`). Please note that increases in performance also depend on the connection bandwidth and remote configurations. +Once pushed, you can easily [list remote experiments] (with `dvc exp list`). + > 📖 See also the [run-cache] mechanism. +[list remote experiments]: + /doc/user-guide/experiment-management/comparing-experiments#list-experiments-saved-remotely [run-cache]: /doc/user-guide/project-structure/internal-files#run-cache -## Listing experiments saved on remotes - -You can use the `dvc exp list` command to list experiments. (with no arguments -it lists the experiments in the current project. You can supply a Git remote -name to list the experiments that have been pushed there: - -```dvc -$ dvc exp list origin -main: - cnn-128 - cnn-32 - cnn-64 - cnn-96 -``` - -Note that by default this only lists experiments derived from the current commit -(local `HEAD` or default remote branch). You can list all the experiments -(derived from from every branch and commit) with the `--all` option: - -```dvc -$ dvc exp list origin --all -0b5bedd: - exp-9edbe -0f73830: - exp-280e9 - exp-4cd96 - ... -main: - cnn-128 - ... -``` - ## Downloading experiments When you clone a DVC repository, it doesn't fetch any experiments by default. In @@ -102,7 +74,7 @@ order to get them, use `dvc exp pull` (with the Git remote and the experiment name), for example: ```dvc -$ dvc exp pull origin cnn-64 +$ dvc exp pull origin cnn-32 ``` This pulls all the necessary files from both remotes. Again, you need to have From 73fccdd73747a13de4aa15b576ff460ac868ae99 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Nov 2021 19:00:38 -0600 Subject: [PATCH 40/56] guide: address feedback from https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-794875761 and below --- .../user-guide/experiment-management/index.md | 8 ++++---- .../experiment-management/sharing-experiments.md | 15 +++++++++------ 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index e8d489df9e..e1f562d985 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -25,7 +25,7 @@ optimization, etc. DVC helps you codify and manage all of your > [ways to organize](#organization-patterns) experiments in your project as > well. -4. Easily [share experiments] across your team. +4. [Share experiments] across your team seamlessly. [share experiments]: /doc/user-guide/experiment-management/sharing-experiments @@ -41,8 +41,6 @@ meaningful measures for the experimental results. ## Experiments -> Note: these features require a Git-enabled DVC repository. - `dvc exp` commands let you automatically track a variation to an established [data pipeline](/doc/command-reference/dag). You can create multiple isolated experiments this way, as well as review, compare, and restore them later, or @@ -60,6 +58,8 @@ roll back to the baseline. The basic workflow goes like this: - Make the selected experiment persistent by committing its results to Git. This cleans the slate so you can repeat the process. +> Note that these features won't work if you don't use Git to version your code. + ## Checkpoints in source code To track successive steps in a longer experiment, you can register checkpoints @@ -96,7 +96,7 @@ These are the main alternatives: - **Git tags and branches** - use the repo's "time dimension" to distribute your experiments. This makes the most sense for experiments that build on each other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can - be easily visualized, for example with tools + be visualized easily, for example with tools [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). - **Directories** - the project's "space dimension" can be structured with directories (folders) to organize experiments. Useful when you want to see all diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index d23320ae74..a06ccffd5d 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,9 +1,11 @@ # Sharing Experiments -If your team uses [Git remotes] on a Git server or hosting (e.g. GitHub, GitLab, -etc.) to collaborate on projects, then you can also use them to save and share -DVC Experiments remotely. You will also need DVC -[remote storage](/doc/command-reference/remote) setup. +If your team uses Git server or hosting (e.g. GitHub, GitLab, etc.) to +collaborate on projects, you can also use them to save and share DVC +Experiments. + +> You will need both [Git remotes] and DVC +> [remote storage](/doc/command-reference/remote) for this. [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes @@ -42,8 +44,9 @@ storage s3://mybucket/my-dvc-store ## Uploading experiments -You can upload an experiment with all of its files and data using `dvc exp push` -(requires a Git remote name and experiment name as arguments). +You can upload an experiment with all of its files and data using +`dvc exp push`, which takes a Git remote name and an experiment ID or name as +arguments. > 💡 You can use `dvc exp show` to find experiment names. From 20943b546bdaa8e81d9d3f7ebb56a8b760bd625b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 7 Nov 2021 22:05:53 -0800 Subject: [PATCH 41/56] guide: rephrase Git history exps org per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-794875761 --- content/docs/user-guide/experiment-management/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index e1f562d985..45da2471eb 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -95,9 +95,9 @@ These are the main alternatives: - **Git tags and branches** - use the repo's "time dimension" to distribute your experiments. This makes the most sense for experiments that build on each - other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can - be visualized easily, for example with tools - [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). + other. Git-based experiment structures are especially helpful along with Git + history exploration tools + [like in GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). - **Directories** - the project's "space dimension" can be structured with directories (folders) to organize experiments. Useful when you want to see all your experiments at the same time (without switching versions) by just From fef15fe4612887fbbd98081a578c1e48c78bac83 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Nov 2021 16:17:38 -0800 Subject: [PATCH 42/56] guide:address Exp sharing feedback from from https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-799490975 and below --- .../user-guide/experiment-management/index.md | 4 ++-- .../sharing-experiments.md | 18 +++++++----------- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 45da2471eb..555e566a84 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -46,6 +46,8 @@ meaningful measures for the experimental results. experiments this way, as well as review, compare, and restore them later, or roll back to the baseline. The basic workflow goes like this: +> Note that these features won't work if you don't version your code with Git. + - Modify stage parameters or other dependencies (e.g. input data, source code) of committed stages. - Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results @@ -58,8 +60,6 @@ roll back to the baseline. The basic workflow goes like this: - Make the selected experiment persistent by committing its results to Git. This cleans the slate so you can repeat the process. -> Note that these features won't work if you don't use Git to version your code. - ## Checkpoints in source code To track successive steps in a longer experiment, you can register checkpoints diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index a06ccffd5d..481ba0df44 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -4,26 +4,22 @@ If your team uses Git server or hosting (e.g. GitHub, GitLab, etc.) to collaborate on projects, you can also use them to save and share DVC Experiments. -> You will need both [Git remotes] and DVC -> [remote storage](/doc/command-reference/remote) for this. - -[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -
## ⚙️ Expand to learn more. Sharing experiments is similar to [sharing regular project data] and artifacts -by synchronizing from remotes. DVC-tracked data, models, etc. are in your -project's cache and thus will be transferred to/from +by synchronizing via DVC and Git remotes. DVC-tracked data, models, etc. are in +your project's cache and thus will be transferred to/from [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google Drive). Small files like experimental code and [DVC metafiles](/doc/user-guide/project-structure) files are uploaded or -downloaded with Git automatically as needed. +downloaded from [Git remotes] by DVC. [sharing regular project data]: /doc/use-cases/sharing-data-and-model-files +[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -You can check you have all the necessary remotes setup with `git remote -v` and +You can check you have the necessary remotes setup with `git remote -v` and `dvc remote list`: ```dvc @@ -92,7 +88,7 @@ can set the number with `--jobs` (`-j`). If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. -## 👨‍🏫 Example: Sharing multiple experiments +## Example: Sharing multiple experiments You can create a loop to push or pull all experiments. For example in a Linux terminal: @@ -103,7 +99,7 @@ $ dvc exp list --all --names-only | while read -r expname ; do \ done ``` -## 👨‍🏫 Example: Dedicated experiment directories +## Example: Dedicated experiment directories A good way to isolate experiments is to create a separate directory outside the current repository for each one. From c0876efaff06c4e853c309bc29ceac10622d493c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Nov 2021 16:28:11 -0800 Subject: [PATCH 43/56] guide: update Git remote auth limitation wording per https://github.com/iterative/dvc.org/pull/2908#discussion_r740764092 --- .../experiment-management/sharing-experiments.md | 2 +- content/docs/user-guide/troubleshooting.md | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 481ba0df44..0abad132d4 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -33,7 +33,7 @@ storage s3://mybucket/my-dvc-store
-> ⚠️ Note that only [SSH Git URLs] are compatible with DVC Experiment sharing. +> ⚠️ Note that DVC can only authenticate with Git remotes using [SSH URLs]. [ssh git urls]: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols diff --git a/content/docs/user-guide/troubleshooting.md b/content/docs/user-guide/troubleshooting.md index 4804adb508..2eef09b77f 100644 --- a/content/docs/user-guide/troubleshooting.md +++ b/content/docs/user-guide/troubleshooting.md @@ -92,16 +92,17 @@ using: $ dvc checkout --relink ``` -## HTTP Git authentication is not supported {#git-auth} +## DVC can only authenticate with Git remotes using SSH URLs {#git-auth} [Experiment sharing](/doc/user-guide/experiment-management/sharing-experiments) -commands accept a `git_remote` argument. In order to access the Git remote, you -may need to authenticate for _write_ (`dvc exp push`) or _read_ (`dvc exp list`, +commands accept a `git_remote` argument. You may need to authenticate to use the +Git remote, for _write_ (`dvc exp push`) or _read_ (`dvc exp list`, `dvc exp pull`) permissions. DVC does not currently support authentication with [Git credentials]. This means that unless the Git server allows unauthenticated HTTP write/read, you should -use an [SSH Git URL] when listing, pulling or pushing experiments. +use an [SSH Git URL] for Git remotes used for listing, pulling or pushing +experiments. [git credentials]: https://git-scm.com/docs/gitcredentials [ssh git url]: From 0552ef87c712b5e773003dce5586620c7f093d36 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 15 Nov 2021 15:17:24 -0600 Subject: [PATCH 44/56] guide: more copy edits on Exp Sharing and Comparing --- .../experiment-management/comparing-experiments.md | 5 +++-- .../experiment-management/sharing-experiments.md | 12 ++++++++---- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/content/docs/user-guide/experiment-management/comparing-experiments.md b/content/docs/user-guide/experiment-management/comparing-experiments.md index 5974896ad9..b49ad35f42 100644 --- a/content/docs/user-guide/experiment-management/comparing-experiments.md +++ b/content/docs/user-guide/experiment-management/comparing-experiments.md @@ -41,8 +41,9 @@ refs/tags/baseline-experiment: cnn-64 ``` -This command lists remote experiments originated from `HEAD`. You can add any -other options to the remote command, including `--all` (see previous section). +This command lists remote experiments based on `HEAD`. You can use `--all` to +list all experiments, or add any other supported option to the remote +`dvc exp list` command. [shared]: /doc/user-guide/experiment-management/sharing-experiments diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 0abad132d4..64567aec58 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,8 +1,7 @@ # Sharing Experiments -If your team uses Git server or hosting (e.g. GitHub, GitLab, etc.) to -collaborate on projects, you can also use them to save and share DVC -Experiments. +DVC can rely on existing Git servers or hosting (e.g. GitHub, GitLab, etc.) to +save and share DVC Experiments.
@@ -14,7 +13,7 @@ your project's cache and thus will be transferred to/from [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google Drive). Small files like experimental code and [DVC metafiles](/doc/user-guide/project-structure) files are uploaded or -downloaded from [Git remotes] by DVC. +downloaded to/from [Git remotes] by DVC. [sharing regular project data]: /doc/use-cases/sharing-data-and-model-files [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes @@ -99,6 +98,11 @@ $ dvc exp list --all --names-only | while read -r expname ; do \ done ``` +> 📖 See also [Listing experiments]. + +[listing experiments]: + /doc/user-guide/experiment-management/comparing-experiments#list-experiments-in-the-project + ## Example: Dedicated experiment directories A good way to isolate experiments is to create a separate directory outside the From 15b93d155a2348a9c1b56c09ed78f810ee815f5f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 16 Nov 2021 19:11:08 -0600 Subject: [PATCH 45/56] guide: clarify `exp list` remote info per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-798148999 --- .../user-guide/experiment-management/comparing-experiments.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/comparing-experiments.md b/content/docs/user-guide/experiment-management/comparing-experiments.md index b49ad35f42..93dc87e173 100644 --- a/content/docs/user-guide/experiment-management/comparing-experiments.md +++ b/content/docs/user-guide/experiment-management/comparing-experiments.md @@ -41,8 +41,8 @@ refs/tags/baseline-experiment: cnn-64 ``` -This command lists remote experiments based on `HEAD`. You can use `--all` to -list all experiments, or add any other supported option to the remote +This command lists remote experiments based on that repo's `HEAD`. You can use +`--all` to list all experiments, or add any other supported option to the remote `dvc exp list` command. [shared]: /doc/user-guide/experiment-management/sharing-experiments From 789dad3945ef76a3c5a04184429e95e83fa0cb61 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Dec 2021 12:52:18 -0600 Subject: [PATCH 46/56] guide: un0hide exp sharing details per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-811970105 --- .../experiment-management/sharing-experiments.md | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 64567aec58..b02e35892b 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -3,12 +3,8 @@ DVC can rely on existing Git servers or hosting (e.g. GitHub, GitLab, etc.) to save and share DVC Experiments. -
- -## ⚙️ Expand to learn more. - Sharing experiments is similar to [sharing regular project data] and artifacts -by synchronizing via DVC and Git remotes. DVC-tracked data, models, etc. are in +by synchronizing with DVC and Git remotes. DVC-tracked data, models, etc. are in your project's cache and thus will be transferred to/from [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google Drive). Small files like experimental code and @@ -30,8 +26,6 @@ $ dvc remote list storage s3://mybucket/my-dvc-store ``` -
- > ⚠️ Note that DVC can only authenticate with Git remotes using [SSH URLs]. [ssh git urls]: From e58b4bfa6a9878ddbccdf1bfa6dcef51b4b606e9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Dec 2021 13:17:37 -0600 Subject: [PATCH 47/56] guide: move multi-exp share example to how-to per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-811970777 --- content/docs/sidebar.json | 24 +++++++++---------- .../sharing-experiments.md | 23 ++++-------------- .../how-to/share-many-experiments.md | 19 +++++++++++++++ 3 files changed, 35 insertions(+), 31 deletions(-) create mode 100644 content/docs/user-guide/how-to/share-many-experiments.md diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 4621d1e014..2ede56928c 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -131,18 +131,6 @@ } ] }, - { - "label": "How To", - "slug": "how-to", - "source": false, - "children": [ - "stop-tracking-data", - "update-tracked-data", - "add-deps-or-outs-to-a-stage", - "merge-conflicts", - "share-a-dvc-cache" - ] - }, { "label": "Experiment Management", "slug": "experiment-management", @@ -156,6 +144,18 @@ "checkpoints" ] }, + { + "slug": "how-to", + "source": false, + "children": [ + "stop-tracking-data", + "update-tracked-data", + "add-deps-or-outs-to-a-stage", + "merge-conflicts", + "share-a-dvc-cache", + "share-many-experiments" + ] + }, "setup-google-drive-remote", "large-dataset-optimization", "external-dependencies", diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index b02e35892b..2eb800ad8e 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -51,13 +51,14 @@ DVC can use multiple threads to upload files (4 per CPU core by default). You can set the number with `--jobs` (`-j`). Please note that increases in performance also depend on the connection bandwidth and remote configurations. -Once pushed, you can easily [list remote experiments] (with `dvc exp list`). +Once pushed, you can easily [list remote experiments] (with `dvc exp list`). To +pus -> 📖 See also the [run-cache] mechanism. +> See also [How to Share Many Experiments][share many]. [list remote experiments]: /doc/user-guide/experiment-management/comparing-experiments#list-experiments-saved-remotely -[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache +[share many]: /doc/user-guide/how-to/share-many-experiments ## Downloading experiments @@ -81,22 +82,6 @@ can set the number with `--jobs` (`-j`). If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. -## Example: Sharing multiple experiments - -You can create a loop to push or pull all experiments. For example in a Linux -terminal: - -```dvc -$ dvc exp list --all --names-only | while read -r expname ; do \ - dvc exp pull origin ${expname} \ -done -``` - -> 📖 See also [Listing experiments]. - -[listing experiments]: - /doc/user-guide/experiment-management/comparing-experiments#list-experiments-in-the-project - ## Example: Dedicated experiment directories A good way to isolate experiments is to create a separate directory outside the diff --git a/content/docs/user-guide/how-to/share-many-experiments.md b/content/docs/user-guide/how-to/share-many-experiments.md new file mode 100644 index 0000000000..f98cb23c62 --- /dev/null +++ b/content/docs/user-guide/how-to/share-many-experiments.md @@ -0,0 +1,19 @@ +# How to Share Many Experiments + +`dvc exp push` and `dvc exp push` allow us to [share experiments] between +repositories via existing DVC and Git remotes. These however work on individual +experiments. + +Here's a simple shell loop to push or pull all experiments (Linux): + +```dvc +$ dvc exp list --all --names-only | while read -r expname ; do \ + dvc exp pull origin ${expname} \ +done +``` + +> 📖 See [Listing Experiments] for more info on `dvc exp list`. + +[share experiments]: /doc/user-guide/experiment-management/sharing-experiments +[listing experiments]: + /doc/user-guide/experiment-management/comparing-experiments#list-experiments-in-the-project From 429733d7f368a92993b88e3837043ed929aea013 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Dec 2021 13:51:26 -0600 Subject: [PATCH 48/56] guide: simplify Exp Sharing intro, add diagram per should be focusing more on explaining (in simple terms, with diagrams) how it works --- .../sharing-experiments.md | 45 +++++++++++++------ 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 2eb800ad8e..6233b04b73 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,21 +1,38 @@ # Sharing Experiments -DVC can rely on existing Git servers or hosting (e.g. GitHub, GitLab, etc.) to -save and share DVC Experiments. - -Sharing experiments is similar to [sharing regular project data] and artifacts -by synchronizing with DVC and Git remotes. DVC-tracked data, models, etc. are in -your project's cache and thus will be transferred to/from -[remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google -Drive). Small files like experimental code and -[DVC metafiles](/doc/user-guide/project-structure) files are uploaded or -downloaded to/from [Git remotes] by DVC. - -[sharing regular project data]: /doc/use-cases/sharing-data-and-model-files +Saving and sharing experiments is similar to [sharing regular project versions], +done by synchronizing with DVC and Git remotes. DVC takes care of pushing and +pulling to/from Git in the case of experiments, however. + +``` + ┌────────────────┐ ┌─────────────────┐ + ├────────────────┤ │ │ + │ DVC remote │ │ Git remote │ + │ storage │ ├─────────────────┤ + └────────────────┘ └─────────────────┘ + ▲ ▲ + │ dvc exp push │ + │ │ + ┌────────┴────────┐ ┌────────┴────────┐ + │ Cached data │ │ Code and │ + │ artifacts │ │ metafiles │ + │ │ │ │ + └─────────────────┘ └─────────────────┘ +``` + +Specifically, data, models, etc. are tracked and cached by DVC and +thus will be transferred to/from [remote storage](/doc/command-reference/remote) +(e.g. Amazon S3 or Google Drive). Small files like code and +[DVC metafiles](/doc/user-guide/project-structure) are uploaded or downloaded +to/from [Git remotes] by DVC. + +[sharing regular project versions]: /doc/use-cases/sharing-data-and-model-files [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -You can check you have the necessary remotes setup with `git remote -v` and -`dvc remote list`: +## Preparation + +Make sure that you have the necessary remotes setup. Let's confirm with +`git remote -v` and `dvc remote list`: ```dvc $ git remote -v From a3b05414531d1b243c8c80244d06552afd3210a8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Dec 2021 13:54:41 -0600 Subject: [PATCH 49/56] guide: fix SSH URLS link in Exp Sharing... --- .../user-guide/experiment-management/sharing-experiments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 6233b04b73..c7151747ed 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -45,7 +45,7 @@ storage s3://mybucket/my-dvc-store > ⚠️ Note that DVC can only authenticate with Git remotes using [SSH URLs]. -[ssh git urls]: +[ssh urls]: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols ## Uploading experiments From ebef902c24b20e78609c8bd9dbb108d405419385 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 14 Dec 2021 20:44:40 -0700 Subject: [PATCH 50/56] exp: roll back unrelated changes --- content/docs/command-reference/exp/run.md | 2 +- content/docs/sidebar.json | 24 +++++++++---------- .../user-guide/experiment-management/index.md | 4 ++-- content/docs/user-guide/troubleshooting.md | 9 ++++--- 4 files changed, 19 insertions(+), 20 deletions(-) diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 4dd24b6896..869f9a7b8a 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -44,7 +44,7 @@ name like `exp-bfe64` by default, which can be customized using the `--name` Experiments are custom [Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References) (found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked -out by DVC). Note that these commits are not pushed to Git remotes by default +out by DVC). Note that these commits are not pushed to the Git remote by default (see `dvc exp push`).
diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 65fb2e7117..f68a665d58 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -132,6 +132,18 @@ } ] }, + { + "label": "How To", + "slug": "how-to", + "source": false, + "children": [ + "stop-tracking-data", + "update-tracked-data", + "add-deps-or-outs-to-a-stage", + "merge-conflicts", + "share-a-dvc-cache" + ] + }, { "label": "Experiment Management", "slug": "experiment-management", @@ -146,18 +158,6 @@ "checkpoints" ] }, - { - "slug": "how-to", - "source": false, - "children": [ - "stop-tracking-data", - "update-tracked-data", - "add-deps-or-outs-to-a-stage", - "merge-conflicts", - "share-a-dvc-cache", - "share-many-experiments" - ] - }, "setup-google-drive-remote", "large-dataset-optimization", "external-dependencies", diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index b032c7e18e..b501fd0a16 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -47,8 +47,8 @@ main alternatives: - **Git tags and branches** - use the repo's "time dimension" to distribute your experiments. This makes the most sense for experiments that build on each - other. Git-based experiment structures are especially helpful along with Git - history exploration tools + other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can + be easily visualized, for example with tools [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). - **Directories** - the project's "space dimension" can be structured with diff --git a/content/docs/user-guide/troubleshooting.md b/content/docs/user-guide/troubleshooting.md index 2eef09b77f..4804adb508 100644 --- a/content/docs/user-guide/troubleshooting.md +++ b/content/docs/user-guide/troubleshooting.md @@ -92,17 +92,16 @@ using: $ dvc checkout --relink ``` -## DVC can only authenticate with Git remotes using SSH URLs {#git-auth} +## HTTP Git authentication is not supported {#git-auth} [Experiment sharing](/doc/user-guide/experiment-management/sharing-experiments) -commands accept a `git_remote` argument. You may need to authenticate to use the -Git remote, for _write_ (`dvc exp push`) or _read_ (`dvc exp list`, +commands accept a `git_remote` argument. In order to access the Git remote, you +may need to authenticate for _write_ (`dvc exp push`) or _read_ (`dvc exp list`, `dvc exp pull`) permissions. DVC does not currently support authentication with [Git credentials]. This means that unless the Git server allows unauthenticated HTTP write/read, you should -use an [SSH Git URL] for Git remotes used for listing, pulling or pushing -experiments. +use an [SSH Git URL] when listing, pulling or pushing experiments. [git credentials]: https://git-scm.com/docs/gitcredentials [ssh git url]: From 15cb714d0d508498d057c9a6433f66e25df09948 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 16 Dec 2021 13:25:29 -0700 Subject: [PATCH 51/56] guide: Git -> Git remote per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-833588777 --- .../user-guide/experiment-management/sharing-experiments.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index c7151747ed..89c98f06d5 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -2,7 +2,7 @@ Saving and sharing experiments is similar to [sharing regular project versions], done by synchronizing with DVC and Git remotes. DVC takes care of pushing and -pulling to/from Git in the case of experiments, however. +pulling to/from Git remotes in the case of experiments, however. ``` ┌────────────────┐ ┌─────────────────┐ @@ -23,8 +23,8 @@ pulling to/from Git in the case of experiments, however. Specifically, data, models, etc. are tracked and cached by DVC and thus will be transferred to/from [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google Drive). Small files like code and -[DVC metafiles](/doc/user-guide/project-structure) are uploaded or downloaded -to/from [Git remotes] by DVC. +[DVC metafiles](/doc/user-guide/project-structure) are tracked by Git, so DVC +uploads and downloads them to/from your existing [Git remotes]. [sharing regular project versions]: /doc/use-cases/sharing-data-and-model-files [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From 675aacec626048de91ccc11b3dd55b38816371cb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 17 Dec 2021 00:13:23 -0700 Subject: [PATCH 52/56] guide: improve Sharing exp intro per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-833590828 --- .../sharing-experiments.md | 25 +++++++++++++------ 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 89c98f06d5..3975cf557f 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,8 +1,14 @@ # Sharing Experiments -Saving and sharing experiments is similar to [sharing regular project versions], -done by synchronizing with DVC and Git remotes. DVC takes care of pushing and -pulling to/from Git remotes in the case of experiments, however. +Regular DVC repository versions are typically synchronized among +team members per regular Git workflow. [DVC Experiments] are directly woven into +this commit history. But they also intend to avoid cluttering shared repos, so +by default they will only exist in the local environment where they were [run]. + +You must explicitly save or share experiments individually on other locations. +This is done similarly to [sharing regular project versions], by synchronizing +with DVC and Git remotes. But DVC takes care of pushing and pulling to/from Git +remotes in the case of experiments. ``` ┌────────────────┐ ┌─────────────────┐ @@ -20,12 +26,15 @@ pulling to/from Git remotes in the case of experiments, however. └─────────────────┘ └─────────────────┘ ``` -Specifically, data, models, etc. are tracked and cached by DVC and -thus will be transferred to/from [remote storage](/doc/command-reference/remote) -(e.g. Amazon S3 or Google Drive). Small files like code and -[DVC metafiles](/doc/user-guide/project-structure) are tracked by Git, so DVC -uploads and downloads them to/from your existing [Git remotes]. +> Specifically, data, models, etc. are tracked and cached by DVC +> and thus will be transferred to/from +> [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google +> Drive). Small files like [DVC metafiles](/doc/user-guide/project-structure) +> and code are tracked by Git, so DVC pushes and pulls them to/from your +> existing [Git remotes]. +[dvc experiments]: /doc/user-guide/experiment-management/experiments-overview +[run]: /doc/user-guide/experiment-management/running-experiments [sharing regular project versions]: /doc/use-cases/sharing-data-and-model-files [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From a9988b0b509df73bc3d13c3429548e96e85f74dd Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 17 Dec 2021 00:35:25 -0700 Subject: [PATCH 53/56] exp push/pull: remove --remote and --jobs details from guide and ref descs. rel. https://github.com/iterative/dvc.org/pull/2908#issuecomment-995385880 --- content/docs/command-reference/exp/pull.md | 2 -- content/docs/command-reference/exp/push.md | 2 -- .../experiment-management/sharing-experiments.md | 13 ------------- 3 files changed, 17 deletions(-) diff --git a/content/docs/command-reference/exp/pull.md b/content/docs/command-reference/exp/pull.md index 6b74bb59d9..e15b4a2bf4 100644 --- a/content/docs/command-reference/exp/pull.md +++ b/content/docs/command-reference/exp/pull.md @@ -37,8 +37,6 @@ your local experiments. By default, this command will also try to [pull](/doc/command-reference/pull) all cached data associated with the experiment to DVC [remote storage](/doc/command-reference/remote), unless `--no-cache` is used. -The default remote is used (see `dvc remote default`) unless a specific one is -given with `--remote`. > 💡 Note that `git push --delete ` can be used to > delete a pushed experiment. diff --git a/content/docs/command-reference/exp/push.md b/content/docs/command-reference/exp/push.md index 92e9e764d3..f2a8ac0fd5 100644 --- a/content/docs/command-reference/exp/push.md +++ b/content/docs/command-reference/exp/push.md @@ -37,8 +37,6 @@ to see experiments in the remote. This command will also try to [push](/doc/command-reference/push) all cached data associated with the experiment to DVC [remote storage](/doc/command-reference/remote), unless `--no-cache` is used. -The default remote is used (see `dvc remote default`) unless a specific one is -given with `--remote`. ## Options diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 3975cf557f..d52eecbf48 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -69,14 +69,6 @@ arguments. $ dvc exp push origin exp-abc123 ``` -The [default DVC remote](/doc/command-reference/remote/default) is used unless -one is specified with the `--remote` (`-r`) option. To prevent pushing -DVC-tracked files to remote storage altogether, use the `--no-cache` option. - -DVC can use multiple threads to upload files (4 per CPU core by default). You -can set the number with `--jobs` (`-j`). Please note that increases in -performance also depend on the connection bandwidth and remote configurations. - Once pushed, you can easily [list remote experiments] (with `dvc exp list`). To pus @@ -100,11 +92,6 @@ This pulls all the necessary files from both remotes. Again, you need to have both of these configured (see this [earlier section](#prepare-remotes-to-share-experiments)). -You can specify a remote to pull from with `--remote` (`-r`). - -DVC can use multiple threads to download files (4 per CPU core typically). You -can set the number with `--jobs` (`-j`). - If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. From 3b7a7e9936205f03b7c19bef352f6fc00682b95d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 17 Dec 2021 00:37:31 -0700 Subject: [PATCH 54/56] guide: remove Sharing Exps example per https://github.com/iterative/dvc.org/pull/2908#issuecomment-996500895 --- .../sharing-experiments.md | 50 ------------------- 1 file changed, 50 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index d52eecbf48..29e246817a 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -94,53 +94,3 @@ both of these configured (see this If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. - -## Example: Dedicated experiment directories - -A good way to isolate experiments is to create a separate directory outside the -current repository for each one. - -> Another alternative is to use `dvc exp apply` and `dvc exp branch`, but here -> we'll see how to use `dvc exp pull` to copy an experiment. - -Suppose there is a DVC repo in `~/my-project` with multiple experiments. Let's -create a copy of experiment `exp-abc12` from it. First, clone the repo into -another directory: - -```dvc -$ git clone ~/my-project ~/my-experiment -$ cd ~/my-experiment -``` - -Git sets the `origin` remote of the cloned repo to `~/my-project`, so you can -see your all experiments from `~/my-experiment` like this: - -```dvc -$ dvc exp list origin -main: - exp-abc12 - ... -``` - -If the original repository doesn't have a `dvc remote`, you can define its -cache as the clone's remote storage: - -```dvc -$ dvc remote add --local --default storage ~/my-project/.dvc/cache -``` - -> ⚠️ `--local` is important here, so that the configuration changes don't -> accidentally get to the original repo. - -Having a DVC remote (and assuming the experiments have been pushed or cached -there) you can `dvc exp pull` the one in question; You can then can -`dvc exp apply` it and get a workspace that contains all of its -files: - -```dvc -$ dvc exp pull origin exp-abc12 -$ dvc exp apply exp-abc12 -``` - -Now you have a separate repo directory for your experiment, containing all its -artifacts! From 0fb52cb392c0ada446388f0ee5bb88d35b1a729b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 21 Dec 2021 20:11:51 -0700 Subject: [PATCH 55/56] guide: simplify Sharing Exps intro per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-835727834 --- .../experiment-management/sharing-experiments.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 29e246817a..e4c8957334 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,9 +1,10 @@ # Sharing Experiments -Regular DVC repository versions are typically synchronized among -team members per regular Git workflow. [DVC Experiments] are directly woven into -this commit history. But they also intend to avoid cluttering shared repos, so -by default they will only exist in the local environment where they were [run]. +In a regular Git workflow, DVC repository versions are typically +synchronized among team members. And [DVC Experiments] are internally connected +to this commit history. But to avoid cluttering everyone's copies of the repo, +by default experiments will only exist in the local environment where they were +[created]. You must explicitly save or share experiments individually on other locations. This is done similarly to [sharing regular project versions], by synchronizing @@ -34,7 +35,7 @@ remotes in the case of experiments. > existing [Git remotes]. [dvc experiments]: /doc/user-guide/experiment-management/experiments-overview -[run]: /doc/user-guide/experiment-management/running-experiments +[created]: /doc/user-guide/experiment-management/running-experiments [sharing regular project versions]: /doc/use-cases/sharing-data-and-model-files [git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes From d6de4f5784d80d44b76942027d9356d5472271e8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 21 Dec 2021 20:36:23 -0700 Subject: [PATCH 56/56] guide: add exp pull to diagram in Sharing Exps per https://github.com/iterative/dvc.org/pull/2908#issuecomment-997117821 --- .../sharing-experiments.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index e4c8957334..2d884262c3 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -12,19 +12,19 @@ with DVC and Git remotes. But DVC takes care of pushing and pulling to/from Git remotes in the case of experiments. ``` - ┌────────────────┐ ┌─────────────────┐ - ├────────────────┤ │ │ - │ DVC remote │ │ Git remote │ - │ storage │ ├─────────────────┤ - └────────────────┘ └─────────────────┘ - ▲ ▲ - │ dvc exp push │ - │ │ - ┌────────┴────────┐ ┌────────┴────────┐ - │ Cached data │ │ Code and │ - │ artifacts │ │ metafiles │ - │ │ │ │ - └─────────────────┘ └─────────────────┘ + ┌────────────────┐ ┌────────────────┐ + ├────────────────┤ │ │ Remote locations + │ DVC remote │ │ Git remote │ + │ storage │ ├────────────────┤ + └────────────────┘ └────────────────┘ + ▲ ▲ + │ dvc exp push │ + │ dvc exp pull │ + ▼ ▼ + ┌─────────────────┐ ┌────────────────┐ + │ │ │ Code and │ + │ Cached data │ │ metafiles │ Local project + └─────────────────┘ └────────────────┘ ``` > Specifically, data, models, etc. are tracked and cached by DVC