Skip to content

Commit

Permalink
guide: revisit Exp Sharing (#2908)
Browse files Browse the repository at this point in the history
* guide: split Experiments (index) into sub-pages

* case: keep Persistent Exps in basic page

* cases: keep Run-cache in basic Exps page

* guide: edit Exp Mgmt index (intro)

* guide: edit basic Exps page inc. persisting them
and move run-cache to guide intro (index)

* guide: rename DVC Exps, remove Org Exps page

* guide: bash -> dvc in EM/Checkpoints

* guide: fix exps link

* guide: summarize Sharing Exps intro

* ref: link from exp push/pull to Exp Sharing guide

* Update content/docs/user-guide/experiment-management/sharing-experiments.md

* guide: rename Exp Sharing sections

* guide: summarize Exp Sharing examples

* guide: link from Exp Mgmt index to Sharing

* guide: ~~isolate~~ from link to Exp Sharing
per #2711 (review)

* Update content/docs/user-guide/experiment-management/sharing-experiments.md

Co-authored-by: David de la Iglesia Castro <[email protected]>

* guide: mention only SSH Git URLs support exp sharing
per #2711 (review)

* guide: update dvc remote example in sharing exps

* yarn format some files
per https://app.circleci.com/pipelines/github/iterative/dvc.org/10086/workflows/9b1bf89f-a432-49f2-9a20-72fe77dd4102/jobs/10145

* guide: consolidate Exp Sharing intro (#2711)

* guide: summarize Sharing Exps intro

* ref: link from exp push/pull to Exp Sharing guide

* Update content/docs/user-guide/experiment-management/sharing-experiments.md

* guide: link from Exp Mgmt index to Sharing

* guide: ~~isolate~~ from link to Exp Sharing
per #2711 (review)

* Update content/docs/user-guide/experiment-management/sharing-experiments.md

Co-authored-by: David de la Iglesia Castro <[email protected]>

* guide: mention only SSH Git URLs support exp sharing
per #2711 (review)

* guide: update dvc remote example in sharing exps

* yarn format some files
per https://app.circleci.com/pipelines/github/iterative/dvc.org/10086/workflows/9b1bf89f-a432-49f2-9a20-72fe77dd4102/jobs/10145

Co-authored-by: David de la Iglesia Castro <[email protected]>

* prettier sharing-experiments.md

* Update content/docs/user-guide/experiment-management/sharing-experiments.md

Co-authored-by: Casper da Costa-Luis <[email protected]>

* guide: roll back wrong files

* guide: roll back Exp Mgmt index...

* guide: link to Sharing Exps from index

* guide: Listing exps on remotes
per #2908 (review)

* guide: don't mention Git here...
per #2908 (review)

* guide: clarify that git is needed for exps and sharing
per #2908 (review)

* guide: clarify note on Git requirement for DVC Exps
per #2908 (review)

* guide: simplify Sharing Exps intro (rel Git)
per #2908 (review)

* guide: rename exp list -r section

* copy edit

* cases: simplify note about requiring Git
per #2908 (review)

* guide: emoji for example in Sharing Exps
per #2908 (comment)

* guide: clarify note about Git-DVC repo required for Exps
per #2908 (review)

* Update content/docs/user-guide/experiment-management/sharing-experiments.md

* guide: another example emoji en Sharing Exps

* Restyled by prettier (#2972)

Co-authored-by: Restyled.io <[email protected]>

* guide: list exps in Comparing guide, linked from Sharing
per #2908 (comment)

* guide: address feedback from
#2908 (review) and below

* guide: rephrase Git history exps org
per #2908 (review)

* guide:address Exp sharing feedback from
from #2908 (review) and below

* guide: update Git remote auth limitation wording
per #2908 (comment)

* guide: more copy edits on Exp Sharing and Comparing

* guide: clarify `exp list` remote info
per #2908 (review)

* guide: un0hide exp sharing details
per #2908 (review)

* guide: move multi-exp share example to how-to
per #2908 (review)

* guide: simplify Exp Sharing intro, add diagram
per should be focusing more on explaining (in simple terms, with diagrams) how it works

* guide: fix SSH URLS link in Exp Sharing...

* exp: roll back unrelated changes

* guide: Git -> Git remote
per #2908 (review)

* guide: improve Sharing exp intro
per #2908 (review)

* exp push/pull: remove --remote and --jobs details from guide and ref descs.
rel. #2908 (comment)

* guide: remove Sharing Exps example
per #2908 (comment)

* guide: simplify Sharing Exps intro
per #2908 (review)

* guide: add exp pull to diagram in Sharing Exps
per #2908 (comment)

Co-authored-by: David de la Iglesia Castro <[email protected]>
Co-authored-by: Casper da Costa-Luis <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
  • Loading branch information
5 people authored Dec 23, 2021
1 parent 33a303c commit 3fab9cd
Show file tree
Hide file tree
Showing 5 changed files with 87 additions and 169 deletions.
8 changes: 4 additions & 4 deletions content/docs/command-reference/exp/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ positional arguments:

## Description

The `dvc exp push` and `dvc exp pull` commands are the means for sharing
experiments across <abbr>repository</abbr> copies via Git (and DVC) remotes.
The `dvc exp push` and `dvc exp pull` commands are the means for [sharing
experiments] across <abbr>repository</abbr> copies via Git and DVC remotes.

[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments

> Plain `git push` and `git fetch` don't work with experiments because these are
> saved under custom Git references. See **How does DVC track experiments?** in
Expand All @@ -35,8 +37,6 @@ your local experiments.
By default, this command will also try to [pull](/doc/command-reference/pull)
all <abbr>cached</abbr> data associated with the experiment to DVC
[remote storage](/doc/command-reference/remote), unless `--no-cache` is used.
The default remote is used (see `dvc remote default`) unless a specific one is
given with `--remote`.

> 💡 Note that `git push <git_remote> --delete <experiment>` can be used to
> delete a pushed experiment.
Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/exp/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ positional arguments:

## Description

The `dvc exp push` and `dvc exp pull` commands are the means for sharing
experiments across <abbr>repository</abbr> copies via Git (and DVC) remotes.
The `dvc exp push` and `dvc exp pull` commands are the means for [sharing
experiments] across <abbr>repository</abbr> copies via Git and DVC remotes.

[sharing experiments]: /doc/user-guide/experiment-management/sharing-experiments

> Plain `git push` and `git fetch` don't work with experiments because these are
> saved under custom Git references. See **How does DVC track experiments?** in
Expand All @@ -35,8 +37,6 @@ to see experiments in the remote.
This command will also try to [push](/doc/command-reference/push) all
<abbr>cached</abbr> data associated with the experiment to DVC
[remote storage](/doc/command-reference/remote), unless `--no-cache` is used.
The default remote is used (see `dvc remote default`) unless a specific one is
given with `--remote`.

## Options

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ refs/tags/baseline-experiment:
cnn-64
```

This command lists remote experiments originated from `HEAD`. You can add any
other options to the remote command, including `--all` (see previous section).
This command lists remote experiments based on that repo's `HEAD`. You can use
`--all` to list all experiments, or add any other supported option to the remote
`dvc exp list` command.

[shared]: /doc/user-guide/experiment-management/sharing-experiments

Expand Down
216 changes: 57 additions & 159 deletions content/docs/user-guide/experiment-management/sharing-experiments.md
Original file line number Diff line number Diff line change
@@ -1,199 +1,97 @@
# Sharing Experiments

There are two types of remotes that can store experiments. Git remotes are
distributed copies of the Git repository, for example on GitHub or GitLab.
In a regular Git workflow, <abbr>DVC repository</abbr> versions are typically
synchronized among team members. And [DVC Experiments] are internally connected
to this commit history. But to avoid cluttering everyone's copies of the repo,
by default experiments will only exist in the local environment where they were
[created].

[DVC remotes](/doc/command-reference/remote) on the other hand are
storage-specific locations (e.g. Amazon S3 or Google Drive) which we can
configure with `dvc remote`. DVC uses them to store and fetch large files that
don't normally fit inside Git repos.
You must explicitly save or share experiments individually on other locations.
This is done similarly to [sharing regular project versions], by synchronizing
with DVC and Git remotes. But DVC takes care of pushing and pulling to/from Git
remotes in the case of experiments.

DVC needs both kinds of remotes for backing up and sharing experiments.
```
┌────────────────┐ ┌────────────────┐
├────────────────┤ │ │ Remote locations
│ DVC remote │ │ Git remote │
│ storage │ ├────────────────┤
└────────────────┘ └────────────────┘
▲ ▲
│ dvc exp push │
│ dvc exp pull │
▼ ▼
┌─────────────────┐ ┌────────────────┐
│ │ │ Code and │
│ Cached data │ │ metafiles │ Local project
└─────────────────┘ └────────────────┘
```

Experiment files that are normally tracked in Git (like code versions) are
shared using Git remotes, and files or directories tracked with DVC (like
datasets) are shared using DVC remotes.
> Specifically, data, models, etc. are tracked and <abbr>cached</abbr> by DVC
> and thus will be transferred to/from
> [remote storage](/doc/command-reference/remote) (e.g. Amazon S3 or Google
> Drive). Small files like [DVC metafiles](/doc/user-guide/project-structure)
> and code are tracked by Git, so DVC pushes and pulls them to/from your
> existing [Git remotes].
> See [Git remotes guide] and `dvc remote add` for information on setting them
> up.
[dvc experiments]: /doc/user-guide/experiment-management/experiments-overview
[created]: /doc/user-guide/experiment-management/running-experiments
[sharing regular project versions]: /doc/use-cases/sharing-data-and-model-files
[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes

[git remotes guide]:
https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes
## Preparation

Normally, there should already be a Git remote called `origin` when you clone a
repo. Use `git remote -v` to list your Git remotes:
Make sure that you have the necessary remotes setup. Let's confirm with
`git remote -v` and `dvc remote list`:

```dvc
$ git remote -v
origin https://github.com/iterative/example-dvc-experiments (fetch)
origin https://github.com/iterative/example-dvc-experiments (push)
```

Similarly, you can see the DVC remotes in you project using `dvc remote list`:
origin [email protected]:iterative/get-started-experiments.git (fetch)
origin [email protected]:iterative/get-started-experiments.git (push)
```dvc
$ dvc remote list
storage https://remote.dvc.org/example-dvc-experiments
```

## Uploading experiments to remotes

You can upload an experiment and its files to both remotes using `dvc exp push`
(requires the Git remote name and experiment name as arguments).

```dvc
$ dvc exp push origin exp-abc123
storage s3://mybucket/my-dvc-store
```

> Use `dvc exp show` to find experiment names.
This pushes the necessary DVC-tracked files from the cache to the default DVC
remote (similar to `dvc push`). You can prevent this behavior by using the
`--no-cache` option to the command above.

If there's no default DVC remote, it will ask you to define one with
`dvc remote default`. If you don't want a default remote, or if you want to use
a different remote, you can specify one with the `--remote` (`-r`) option.

DVC can use multiple threads to upload files (4 per CPU core by default). You
can set the number with `--jobs` (`-j`). Please note that increases in
performance also depend on the connection bandwidth and remote configurations.
> ⚠️ Note that DVC can only authenticate with Git remotes using [SSH URLs].
> 📖 See also the [run-cache] mechanism.
[ssh urls]:
https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols#_the_protocols

[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache
## Uploading experiments

## Listing experiments remotely
You can upload an experiment with all of its files and data using
`dvc exp push`, which takes a Git remote name and an experiment ID or name as
arguments.

In order to list experiments in a DVC project, you can use the `dvc exp list`
command. With no command line options, it lists the experiments in the current
project.

You can supply a Git remote name to list the experiments:
> 💡 You can use `dvc exp show` to find experiment names.
```dvc
$ dvc exp list origin
main:
cnn-128
cnn-32
cnn-64
cnn-96
$ dvc exp push origin exp-abc123
```

Note that by default this only lists experiments derived from the current commit
(local `HEAD` or default remote branch). You can list all the experiments
(derived from from every branch and commit) with the `--all` option:
Once pushed, you can easily [list remote experiments] (with `dvc exp list`). To
pus

```dvc
$ dvc exp list origin --all
0b5bedd:
exp-9edbe
0f73830:
exp-280e9
exp-4cd96
...
main:
cnn-128
...
```
> See also [How to Share Many Experiments][share many].
When you don't need to see the parent commits, you can list experiment names
only, with `--names-only`:

```dvc
$ dvc exp list origin --names-only
cnn-128
cnn-32
cnn-64
cnn-96
```
[list remote experiments]:
/doc/user-guide/experiment-management/comparing-experiments#list-experiments-saved-remotely
[share many]: /doc/user-guide/how-to/share-many-experiments

## Downloading experiments from remotes
## Downloading experiments

When you clone a DVC repository, it doesn't fetch any experiments by default. In
order to get them, use `dvc exp pull` (with the Git remote and the experiment
name), for example:

```dvc
$ dvc exp pull origin cnn-64
$ dvc exp pull origin cnn-32
```

This pulls all the necessary files from both remotes. Again, you need to have
both of these configured (see this
[earlier section](#prepare-remotes-to-share-experiments)).

You can specify a remote to pull from with `--remote` (`-r`).

DVC can use multiple threads to download files (4 per CPU core typically). You
can set the number with `--jobs` (`-j`).

If an experiment being pulled already exists in the local project, DVC won't
overwrite it unless you supply `--force`.

### Example: Pushing or pulling multiple experiments

You can create a loop to upload or download all experiments like this:

```dvc
$ dvc exp list --all --names-only | while read -r expname ; do \
dvc exp pull origin ${expname} \
done
```

> Without `--all`, only the experiments derived from the current commit will be
> pushed/pulled.
## Example: Creating a directory for an experiment

A good way to isolate experiments is to create a separate home directory for
each one.

> Another alternative is to use `dvc exp apply` and `dvc exp branch`, but here
> we'll see how to use `dvc exp pull` to copy an experiment.
Suppose there is a <abbr>DVC repository</abbr> in `~/my-project` with multiple
experiments. Let's create a copy of experiment `exp-abc12` from there.

First, clone the repo into another directory:

```dvc
$ git clone ~/my-project ~/my-experiment
$ cd ~/my-experiment
```

Git sets the `origin` remote of the cloned repo to `~/my-project`, so you can
see your all experiments from `~/my-experiment` like this:

```dvc
$ dvc exp list origin
main:
exp-abc12
...
```

If there is no DVC remote in the original repository, you can define its
<abbr>cache</abbr> as the clone's `dvc remote`:

```dvc
$ dvc remote add --local --default storage ~/my-project/.dvc/cache
```

> ⚠️ `--local` is important here, so that the configuration change doesn't get
> to the original repo accidentally.
If there's a DVC remote for the project, assuming the experiments have been
pushed there, you can pull the one in question:

```dvc
$ dvc exp pull origin exp-abc12
```

Then we can `dvc apply` this experiment and get a <abbr>workspace</abbr> that
contains all of its files:

```dvc
$ dvc exp apply exp-abc12
```

Now you have a dedicated directory for your experiment, containing all its
artifacts!
19 changes: 19 additions & 0 deletions content/docs/user-guide/how-to/share-many-experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# How to Share Many Experiments

`dvc exp push` and `dvc exp push` allow us to [share experiments] between
repositories via existing DVC and Git remotes. These however work on individual
experiments.

Here's a simple shell loop to push or pull all experiments (Linux):

```dvc
$ dvc exp list --all --names-only | while read -r expname ; do \
dvc exp pull origin ${expname} \
done
```

> 📖 See [Listing Experiments] for more info on `dvc exp list`.
[share experiments]: /doc/user-guide/experiment-management/sharing-experiments
[listing experiments]:
/doc/user-guide/experiment-management/comparing-experiments#list-experiments-in-the-project

0 comments on commit 3fab9cd

Please sign in to comment.