From 3fab9cd06edbba5ed7f1ff7dc534dac447d9016d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 23 Dec 2021 00:07:44 -0600 Subject: [PATCH] guide: revisit Exp Sharing (#2908) * guide: split Experiments (index) into sub-pages * case: keep Persistent Exps in basic page * cases: keep Run-cache in basic Exps page * guide: edit Exp Mgmt index (intro) * guide: edit basic Exps page inc. persisting them and move run-cache to guide intro (index) * guide: rename DVC Exps, remove Org Exps page * guide: bash -> dvc in EM/Checkpoints * guide: fix exps link * guide: summarize Sharing Exps intro * ref: link from exp push/pull to Exp Sharing guide * Update content/docs/user-guide/experiment-management/sharing-experiments.md * guide: rename Exp Sharing sections * guide: summarize Exp Sharing examples * guide: link from Exp Mgmt index to Sharing * guide: ~~isolate~~ from link to Exp Sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-730211648 * Update content/docs/user-guide/experiment-management/sharing-experiments.md Co-authored-by: David de la Iglesia Castro * guide: mention only SSH Git URLs support exp sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-732522579 * guide: update dvc remote example in sharing exps * yarn format some files per https://app.circleci.com/pipelines/github/iterative/dvc.org/10086/workflows/9b1bf89f-a432-49f2-9a20-72fe77dd4102/jobs/10145 * guide: consolidate Exp Sharing intro (#2711) * guide: summarize Sharing Exps intro * ref: link from exp push/pull to Exp Sharing guide * Update content/docs/user-guide/experiment-management/sharing-experiments.md * guide: link from Exp Mgmt index to Sharing * guide: ~~isolate~~ from link to Exp Sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-730211648 * Update content/docs/user-guide/experiment-management/sharing-experiments.md Co-authored-by: David de la Iglesia Castro * guide: mention only SSH Git URLs support exp sharing per https://github.com/iterative/dvc.org/pull/2711#pullrequestreview-732522579 * guide: update dvc remote example in sharing exps * yarn format some files per https://app.circleci.com/pipelines/github/iterative/dvc.org/10086/workflows/9b1bf89f-a432-49f2-9a20-72fe77dd4102/jobs/10145 Co-authored-by: David de la Iglesia Castro * prettier sharing-experiments.md * Update content/docs/user-guide/experiment-management/sharing-experiments.md Co-authored-by: Casper da Costa-Luis * guide: roll back wrong files * guide: roll back Exp Mgmt index... * guide: link to Sharing Exps from index * guide: Listing exps on remotes per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775502448 * guide: don't mention Git here... per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775506755 * guide: clarify that git is needed for exps and sharing per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775506990 * guide: clarify note on Git requirement for DVC Exps per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775743738 * guide: simplify Sharing Exps intro (rel Git) per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775746073 * guide: rename exp list -r section * copy edit * cases: simplify note about requiring Git per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-775743738 * guide: emoji for example in Sharing Exps per https://github.com/iterative/dvc.org/pull/2908#discussion_r737121788 * guide: clarify note about Git-DVC repo required for Exps per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-776668543 * Update content/docs/user-guide/experiment-management/sharing-experiments.md * guide: another example emoji en Sharing Exps * Restyled by prettier (#2972) Co-authored-by: Restyled.io * guide: list exps in Comparing guide, linked from Sharing per https://github.com/iterative/dvc.org/pull/2908#discussion_r725406368 * guide: address feedback from https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-794875761 and below * guide: rephrase Git history exps org per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-794875761 * guide:address Exp sharing feedback from from https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-799490975 and below * guide: update Git remote auth limitation wording per https://github.com/iterative/dvc.org/pull/2908#discussion_r740764092 * guide: more copy edits on Exp Sharing and Comparing * guide: clarify `exp list` remote info per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-798148999 * guide: un0hide exp sharing details per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-811970105 * guide: move multi-exp share example to how-to per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-811970777 * guide: simplify Exp Sharing intro, add diagram per should be focusing more on explaining (in simple terms, with diagrams) how it works * guide: fix SSH URLS link in Exp Sharing... * exp: roll back unrelated changes * guide: Git -> Git remote per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-833588777 * guide: improve Sharing exp intro per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-833590828 * exp push/pull: remove --remote and --jobs details from guide and ref descs. rel. https://github.com/iterative/dvc.org/pull/2908#issuecomment-995385880 * guide: remove Sharing Exps example per https://github.com/iterative/dvc.org/pull/2908#issuecomment-996500895 * guide: simplify Sharing Exps intro per https://github.com/iterative/dvc.org/pull/2908#pullrequestreview-835727834 * guide: add exp pull to diagram in Sharing Exps per https://github.com/iterative/dvc.org/pull/2908#issuecomment-997117821 Co-authored-by: David de la Iglesia Castro Co-authored-by: Casper da Costa-Luis Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io --- content/docs/command-reference/exp/pull.md | 8 +- content/docs/command-reference/exp/push.md | 8 +- .../comparing-experiments.md | 5 +- .../sharing-experiments.md | 216 +++++------------- .../how-to/share-many-experiments.md | 19 ++ 5 files changed, 87 insertions(+), 169 deletions(-) create mode 100644 content/docs/user-guide/how-to/share-many-experiments.md diff --git a/content/docs/command-reference/exp/pull.md b/content/docs/command-reference/exp/pull.md index eeeaad9f8a..e15b4a2bf4 100644 --- a/content/docs/command-reference/exp/pull.md +++ b/content/docs/command-reference/exp/pull.md @@ -17,8 +17,10 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for sharing -experiments across repository copies via Git (and DVC) remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for [sharing +experiments] across repository copies via Git and DVC remotes. + +[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments > Plain `git push` and `git fetch` don't work with experiments because these are > saved under custom Git references. See **How does DVC track experiments?** in @@ -35,8 +37,6 @@ your local experiments. By default, this command will also try to [pull](/doc/command-reference/pull) all cached data associated with the experiment to DVC [remote storage](/doc/command-reference/remote), unless `--no-cache` is used. -The default remote is used (see `dvc remote default`) unless a specific one is -given with `--remote`. > 💡 Note that `git push --delete ` can be used to > delete a pushed experiment. diff --git a/content/docs/command-reference/exp/push.md b/content/docs/command-reference/exp/push.md index 1c1d1195aa..f2a8ac0fd5 100644 --- a/content/docs/command-reference/exp/push.md +++ b/content/docs/command-reference/exp/push.md @@ -17,8 +17,10 @@ positional arguments: ## Description -The `dvc exp push` and `dvc exp pull` commands are the means for sharing -experiments across repository copies via Git (and DVC) remotes. +The `dvc exp push` and `dvc exp pull` commands are the means for [sharing +experiments] across repository copies via Git and DVC remotes. + +[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments > Plain `git push` and `git fetch` don't work with experiments because these are > saved under custom Git references. See **How does DVC track experiments?** in @@ -35,8 +37,6 @@ to see experiments in the remote. This command will also try to [push](/doc/command-reference/push) all cached data associated with the experiment to DVC [remote storage](/doc/command-reference/remote), unless `--no-cache` is used. -The default remote is used (see `dvc remote default`) unless a specific one is -given with `--remote`. ## Options diff --git a/content/docs/user-guide/experiment-management/comparing-experiments.md b/content/docs/user-guide/experiment-management/comparing-experiments.md index 5974896ad9..93dc87e173 100644 --- a/content/docs/user-guide/experiment-management/comparing-experiments.md +++ b/content/docs/user-guide/experiment-management/comparing-experiments.md @@ -41,8 +41,9 @@ refs/tags/baseline-experiment: cnn-64 ``` -This command lists remote experiments originated from `HEAD`. You can add any -other options to the remote command, including `--all` (see previous section). +This command lists remote experiments based on that repo's `HEAD`. You can use +`--all` to list all experiments, or add any other supported option to the remote +`dvc exp list` command. [shared]: /doc/user-guide/experiment-management/sharing-experiments diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index a4512870b1..2d884262c3 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,199 +1,97 @@ # Sharing Experiments -There are two types of remotes that can store experiments. Git remotes are -distributed copies of the Git repository, for example on GitHub or GitLab. +In a regular Git workflow, DVC repository versions are typically +synchronized among team members. And [DVC Experiments] are internally connected +to this commit history. But to avoid cluttering everyone's copies of the repo, +by default experiments will only exist in the local environment where they were +[created]. -[DVC remotes](/doc/command-reference/remote) on the other hand are -storage-specific locations (e.g. Amazon S3 or Google Drive) which we can -configure with `dvc remote`. DVC uses them to store and fetch large files that -don't normally fit inside Git repos. +You must explicitly save or share experiments individually on other locations. +This is done similarly to [sharing regular project versions], by synchronizing +with DVC and Git remotes. But DVC takes care of pushing and pulling to/from Git +remotes in the case of experiments. -DVC needs both kinds of remotes for backing up and sharing experiments. +``` + ┌────────────────┐ ┌────────────────┐ + ├────────────────┤ │ │ Remote locations + │ DVC remote │ │ Git remote │ + │ storage │ ├────────────────┤ + └────────────────┘ └────────────────┘ + ▲ ▲ + │ dvc exp push │ + │ dvc exp pull │ + ▼ ▼ + ┌─────────────────┐ ┌────────────────┐ + │ │ │ Code and │ + │ Cached data │ │ metafiles │ Local project + └─────────────────┘ └────────────────┘ +``` -Experiment files that are normally tracked in Git (like code versions) are -shared using Git remotes, and files or directories tracked with DVC (like -datasets) are shared using DVC remotes. +> Specifically, data, models, etc. are tracked and cached by DVC +> and thus will be transferred to/from +> [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google +> Drive). Small files like [DVC metafiles](/doc/user-guide/project-structure) +> and code are tracked by Git, so DVC pushes and pulls them to/from your +> existing [Git remotes]. -> See [Git remotes guide] and `dvc remote add` for information on setting them -> up. +[dvc experiments]: /doc/user-guide/experiment-management/experiments-overview +[created]: /doc/user-guide/experiment-management/running-experiments +[sharing regular project versions]: /doc/use-cases/sharing-data-and-model-files +[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -[git remotes guide]: - https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes +## Preparation -Normally, there should already be a Git remote called `origin` when you clone a -repo. Use `git remote -v` to list your Git remotes: +Make sure that you have the necessary remotes setup. Let's confirm with +`git remote -v` and `dvc remote list`: ```dvc $ git remote -v -origin https://github.com/iterative/example-dvc-experiments (fetch) -origin https://github.com/iterative/example-dvc-experiments (push) -``` - -Similarly, you can see the DVC remotes in you project using `dvc remote list`: +origin git@github.com:iterative/get-started-experiments.git (fetch) +origin git@github.com:iterative/get-started-experiments.git (push) -```dvc $ dvc remote list -storage https://remote.dvc.org/example-dvc-experiments -``` - -## Uploading experiments to remotes - -You can upload an experiment and its files to both remotes using `dvc exp push` -(requires the Git remote name and experiment name as arguments). - -```dvc -$ dvc exp push origin exp-abc123 +storage s3://mybucket/my-dvc-store ``` -> Use `dvc exp show` to find experiment names. - -This pushes the necessary DVC-tracked files from the cache to the default DVC -remote (similar to `dvc push`). You can prevent this behavior by using the -`--no-cache` option to the command above. - -If there's no default DVC remote, it will ask you to define one with -`dvc remote default`. If you don't want a default remote, or if you want to use -a different remote, you can specify one with the `--remote` (`-r`) option. - -DVC can use multiple threads to upload files (4 per CPU core by default). You -can set the number with `--jobs` (`-j`). Please note that increases in -performance also depend on the connection bandwidth and remote configurations. +> ⚠️ Note that DVC can only authenticate with Git remotes using [SSH URLs]. -> 📖 See also the [run-cache] mechanism. +[ssh urls]: + https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols -[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache +## Uploading experiments -## Listing experiments remotely +You can upload an experiment with all of its files and data using +`dvc exp push`, which takes a Git remote name and an experiment ID or name as +arguments. -In order to list experiments in a DVC project, you can use the `dvc exp list` -command. With no command line options, it lists the experiments in the current -project. - -You can supply a Git remote name to list the experiments: +> 💡 You can use `dvc exp show` to find experiment names. ```dvc -$ dvc exp list origin -main: - cnn-128 - cnn-32 - cnn-64 - cnn-96 +$ dvc exp push origin exp-abc123 ``` -Note that by default this only lists experiments derived from the current commit -(local `HEAD` or default remote branch). You can list all the experiments -(derived from from every branch and commit) with the `--all` option: +Once pushed, you can easily [list remote experiments] (with `dvc exp list`). To +pus -```dvc -$ dvc exp list origin --all -0b5bedd: - exp-9edbe -0f73830: - exp-280e9 - exp-4cd96 - ... -main: - cnn-128 - ... -``` +> See also [How to Share Many Experiments][share many]. -When you don't need to see the parent commits, you can list experiment names -only, with `--names-only`: - -```dvc -$ dvc exp list origin --names-only -cnn-128 -cnn-32 -cnn-64 -cnn-96 -``` +[list remote experiments]: + /doc/user-guide/experiment-management/comparing-experiments#list-experiments-saved-remotely +[share many]: /doc/user-guide/how-to/share-many-experiments -## Downloading experiments from remotes +## Downloading experiments When you clone a DVC repository, it doesn't fetch any experiments by default. In order to get them, use `dvc exp pull` (with the Git remote and the experiment name), for example: ```dvc -$ dvc exp pull origin cnn-64 +$ dvc exp pull origin cnn-32 ``` This pulls all the necessary files from both remotes. Again, you need to have both of these configured (see this [earlier section](#prepare-remotes-to-share-experiments)). -You can specify a remote to pull from with `--remote` (`-r`). - -DVC can use multiple threads to download files (4 per CPU core typically). You -can set the number with `--jobs` (`-j`). - If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. - -### Example: Pushing or pulling multiple experiments - -You can create a loop to upload or download all experiments like this: - -```dvc -$ dvc exp list --all --names-only | while read -r expname ; do \ - dvc exp pull origin ${expname} \ -done -``` - -> Without `--all`, only the experiments derived from the current commit will be -> pushed/pulled. - -## Example: Creating a directory for an experiment - -A good way to isolate experiments is to create a separate home directory for -each one. - -> Another alternative is to use `dvc exp apply` and `dvc exp branch`, but here -> we'll see how to use `dvc exp pull` to copy an experiment. - -Suppose there is a DVC repository in `~/my-project` with multiple -experiments. Let's create a copy of experiment `exp-abc12` from there. - -First, clone the repo into another directory: - -```dvc -$ git clone ~/my-project ~/my-experiment -$ cd ~/my-experiment -``` - -Git sets the `origin` remote of the cloned repo to `~/my-project`, so you can -see your all experiments from `~/my-experiment` like this: - -```dvc -$ dvc exp list origin -main: - exp-abc12 - ... -``` - -If there is no DVC remote in the original repository, you can define its -cache as the clone's `dvc remote`: - -```dvc -$ dvc remote add --local --default storage ~/my-project/.dvc/cache -``` - -> ⚠️ `--local` is important here, so that the configuration change doesn't get -> to the original repo accidentally. - -If there's a DVC remote for the project, assuming the experiments have been -pushed there, you can pull the one in question: - -```dvc -$ dvc exp pull origin exp-abc12 -``` - -Then we can `dvc apply` this experiment and get a workspace that -contains all of its files: - -```dvc -$ dvc exp apply exp-abc12 -``` - -Now you have a dedicated directory for your experiment, containing all its -artifacts! diff --git a/content/docs/user-guide/how-to/share-many-experiments.md b/content/docs/user-guide/how-to/share-many-experiments.md new file mode 100644 index 0000000000..f98cb23c62 --- /dev/null +++ b/content/docs/user-guide/how-to/share-many-experiments.md @@ -0,0 +1,19 @@ +# How to Share Many Experiments + +`dvc exp push` and `dvc exp push` allow us to [share experiments] between +repositories via existing DVC and Git remotes. These however work on individual +experiments. + +Here's a simple shell loop to push or pull all experiments (Linux): + +```dvc +$ dvc exp list --all --names-only | while read -r expname ; do \ + dvc exp pull origin ${expname} \ +done +``` + +> 📖 See [Listing Experiments] for more info on `dvc exp list`. + +[share experiments]: /doc/user-guide/experiment-management/sharing-experiments +[listing experiments]: + /doc/user-guide/experiment-management/comparing-experiments#list-experiments-in-the-project