Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: expand Experiments guide #2654

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
5008c55
guide: split Experiments (index) into sub-pages
jorgeorpinel Jul 21, 2021
ff85352
Merge branch 'master' into guide/exps
jorgeorpinel Jul 28, 2021
923040f
case: keep Persistent Exps in basic page
jorgeorpinel Jul 29, 2021
3ae85e5
cases: keep Run-cache in basic Exps page
jorgeorpinel Jul 29, 2021
29b17b2
guide: edit Exp Mgmt index (intro)
jorgeorpinel Jul 29, 2021
e21fef4
guide: edit basic Exps page inc. persisting them
jorgeorpinel Jul 29, 2021
c21dbe3
Merge branch 'master' into guide/exps
jorgeorpinel Aug 4, 2021
d8f2d7c
guide: rename DVC Exps, remove Org Exps page
jorgeorpinel Aug 4, 2021
1337453
guide: bash -> dvc in EM/Checkpoints
jorgeorpinel Aug 4, 2021
8d93521
guide: fix exps link
jorgeorpinel Aug 4, 2021
90f3042
Merge branch 'master' into guide/exps
jorgeorpinel Aug 11, 2021
fb4663c
Merge branch 'master' into guide/exps
jorgeorpinel Aug 18, 2021
d1422b1
guide: consolidate Exp Sharing intro (#2711)
jorgeorpinel Aug 18, 2021
532df56
Merge branch 'master' into guide/exps
jorgeorpinel Aug 18, 2021
0581991
guide: summarize Exp Sharing titles and examples (#2719)
jorgeorpinel Aug 20, 2021
e6d4eca
Merge branch 'master' into guide/exps
jorgeorpinel Oct 4, 2021
ec2ac41
Merge branch 'guide/exps' of github.com:iterative/dvc.org into guide/…
jorgeorpinel Oct 4, 2021
2e4a512
Merge branch 'master' into guide/exps
jorgeorpinel Oct 6, 2021
e4f4024
exp: fix links to old guides
jorgeorpinel Oct 6, 2021
581a9a9
guide: review links to Persistent Exps and Checkpoints info
jorgeorpinel Oct 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@
"label": "Experiment Management",
"slug": "experiment-management",
"source": "experiment-management/index.md",
"children": ["checkpoints"]
"children": ["experiments", "checkpoints", "organization"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

organization -> Organizing Project ?
organization is way to abstract ... essentially it's about how do you structure the workflow?

to be honest very hard to tell if the existing structure makes sense - would be great to see more content that you have in mind for each section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the Organization Patterns section back into the index so it has more context. If we get more content for this we can split and rename.

},
"setup-google-drive-remote",
"large-dataset-optimization",
Expand Down
5 changes: 3 additions & 2 deletions content/docs/user-guide/basic-concepts/experiment.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
---
name: Experiment
match: [experiment, experiments]
match: [experiment, experiments, 'DVC experiments']
tooltip: >-
An attempt to reach desired/better/interesting results during data pipelining
or ML model development. DVC is designed to help [manage
experiments](/doc/start/experiments), having [built-in
mechanisms](/doc/user-guide/experiment-management) like the
[run-cache](/doc/user-guide/project-structure/internal-files#run-cache) and
the `dvc experiments` commands (available on DVC 2.0 and above).
the [`dvc experiments`](/doc/command-reference/exp) commands (available on DVC
2.0 and above).
---
80 changes: 35 additions & 45 deletions content/docs/user-guide/experiment-management/checkpoints.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,32 @@
# Checkpoints

ML checkpoints are an important part of deep learning because ML engineers like
to save the model files at certain points during a training process.
_New in DVC 2.0_

With DVC experiments and checkpoints, you can:
To track successive steps in a longer experiment, you can register checkpoints
from your code at runtime. This is especially helpful in machine learning, for
example to track the progress in deep learning techniques such as evolving
neural networks.

- Implement the best practice in deep learning to save your model weights as
_Checkpoint experiments_ track a series of variations (the checkpoints) and
their execution can be stopped and resumed as needed. You interact with them
using the `--rev` and `--reset` options of `dvc exp run` (see also the
`checkpoint` field in `dvc.yaml` `outs`). They can help you

- implement the best practice in deep learning to save your model weights as
checkpoints.
- Track all code and data changes corresponding to the checkpoints.
- See when metrics start diverging and revert to the optimal checkpoint.
- Automate the process of tracking every training epoch.
- track all code and data changes corresponding to the checkpoints.
- see when metrics start diverging and revert to the optimal checkpoint.
- automate the process of tracking every training epoch.

[The way checkpoints are implemented by DVC](/blog/experiment-refs) utilizes
_ephemeral_ experiment commits and experiment branches within DVC. They are
created using the metadata from experiments and are tracked with the `exps`
custom Git reference.
> Experiments and checkpoints are [implemented](/blog/experiment-refs) with
> hidden Git experiment commits branches.

You can add experiments to your Git history by committing the experiment you
want to track, which you'll see later in this tutorial.
Like with regular experiments, checkpoints can become persistent by
[committing them to Git](#committing-checkpoints-to-git).

This tutorial is going to cover how to implement checkpoints in an ML project
using DVC. We're going to train a model to identify handwritten digits based on
the MNIST dataset.
This guide covers how to implement checkpoints in an ML project using DVC. We're
going to train a model to identify handwritten digits based on the MNIST
dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this PR: below: Setting up the project ... should we do ## -> ###?

This comment was marked as resolved.


<details>

Expand Down Expand Up @@ -62,9 +67,9 @@ everything you need to get started with experiments and checkpoints.

## Setting up a DVC pipeline

DVC versions data and it also can version the machine learning model weights
file as checkpoints during the training process. To enable this, you will need
to set up a DVC pipeline to train your model.
DVC versions data and it also can version the ML model weights file as
checkpoints during the training process. To enable this, you will need to set up
a DVC pipeline to train your model.

Adding a DVC pipeline only takes a few commands. At the root of the project,
run:
Expand Down Expand Up @@ -190,9 +195,8 @@ You can read about what the line `dvclive.log(k, v)` does in the

The [`dvclive.next_step()`](/doc/dvclive/api-reference/next_step) line tells DVC
that it can take a snapshot of the entire workspace and version it with Git.
It's important that with this approach only code with metadata is versioned in
Git (as an ephemeral commit), while the actual model weight file will be stored
in the DVC data cache.
It's important that with this approach only code with metadata is versioned,
while the actual model weight file will be stored in the DVC data cache.

## Running experiments

Expand Down Expand Up @@ -407,39 +411,25 @@ new set of checkpoints under a new experiment branch.
└─────────────────────────┴──────────┴──────┴─────────┴────────┴────────┴────────┴──────────────┘
```

## Adding checkpoints to Git
## Committing checkpoints to Git

When you terminate training, you'll see a few commands in the terminal that will
allow you to add these changes to Git.
allow you to add these changes to Git, making them persistent:

```
```dvc
To track the changes with git, run:

git add dvclive.json dvc.yaml .gitignore train.py dvc.lock

Reproduced experiment(s): exp-263da
Experiment results have been applied to your workspace.

To promote an experiment to a Git branch run:

dvc exp branch <exp>
```

You can run the following command to save your experiments to the Git history.

```bash
$ git add dvclive.json dvc.yaml .gitignore train.py dvc.lock
...
```

You can take a look at what will be committed to your Git history by running:
Running the command above will stage the checkpoint experiment with Git. You can
take a look at what would be committed first with `git status`. You should see
something similar to this in your terminal:

```bash
```dvc
$ git status
```

You should see something similar to this in your terminal.

```
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitignore
Expand All @@ -456,7 +446,7 @@ Untracked files:
predictions.json
```

All that's left is to commit these changes with the following command:
All that's left to do is to `git commit` the changes:

```bash
$ git commit -m 'saved files from experiment'
Expand Down
33 changes: 33 additions & 0 deletions content/docs/user-guide/experiment-management/experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## Experiments

_New in DVC 2.0_

`dvc exp` commands let you automatically track a variation to an established
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR: track a variation to an established data pipeline sounds very complicated.

dvc exp run, dvc exp show, and other dvc exp commands automatically capture and save experiment runs, including code, data, metrics, models, etc.

or even better - get rid of this sentence :) It does more harm than good to my mind.

[data pipeline](/doc/command-reference/dag). You can create multiple isolated
experiments this way, as well as review, compare, and restore them later, or
roll back to the baseline. The basic workflow goes like this:

- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR: Modify hyperparameters (link to params), code ... etc ... we are being way too formal here again. I thinks we can sacrifice it a bit and use some common ML terminology

source code) of committed stages.
- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results
are reflected in your <abbr>workspace</abbr>, and tracked automatically.
- Use `dvc metrics` to identify the best experiment(s).
- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat
🔄
- Use `dvc exp apply` to roll back to the best one.
- Make the selected experiment persistent by committing its results to Git. This
cleans the slate so you can repeat the process.

## Persistent Experiments

When your experiments are good enough to save or share, you may want to store
them persistently as Git commits in your <abbr>repository</abbr>.

Whether the results were produced with `dvc repro` directly, or after a
`dvc exp` workflow, `dvc.yaml` and `dvc.lock` will define the experiment as a
new project version. The right <abbr>outputs</abbr> (including
[metrics](/doc/command-reference/metrics)) should also be present, or available
via `dvc checkout`.

Use `dvc exp apply` and `dvc exp branch` to persist experiments in your Git
history.
139 changes: 45 additions & 94 deletions content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
@@ -1,113 +1,64 @@
# Experiment Management

_New in DVC 2.0_

Data science and ML are iterative processes that require a large number of
attempts to reach a certain level of a metric. Experimentation is part of the
development of data features, hyperspace exploration, deep learning
optimization, etc. DVC helps you codify and manage all of your
<abbr>experiments</abbr>, supporting these main approaches:

1. Create [experiments](#experiments) that derive from your latest project
version without having to track them manually. DVC does that automatically,
letting you list and compare them. The best ones can be made persistent, and
the rest archived.
2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series
of variations, forming a deep experiment. DVC helps you capture them at
runtime, and manage them in batches.
3. Make experiments or checkpoints [persistent](#persistent-experiments) by
committing them to your <abbr>repository</abbr>. Or create these versions
from scratch like typical project changes.
optimization, etc.

At this point you may also want to consider the different
[ways to organize](#organization-patterns) experiments in your project (as
Git branches, as folders, etc.).

DVC also provides specialized features to codify and analyze experiments.
Some of DVC's base features already help you codify and analyze experiments.
[Parameters](/doc/command-reference/params) are simple values you can tweak in a
human-readable text file, which cause different behaviors in your code and
models. On the other end, [metrics](/doc/command-reference/metrics) (and
formatted text file; They cause different behaviors in your code and models. On
the other end, [metrics](/doc/command-reference/metrics) (and
[plots](/doc/command-reference/plots)) let you define, visualize, and compare
meaningful measures for the experimental results.

> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
> introduction to DVC experiments.

## Experiments

`dvc exp` commands let you automatically track a variation to an established
[data pipeline](/doc/command-reference/dag). You can create multiple isolated
experiments this way, as well as review, compare, and restore them later, or
roll back to the baseline. The basic workflow goes like this:

- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data,
source code) of committed stages.
- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results
are reflected in your <abbr>workspace</abbr>, and tracked automatically.
- Use [metrics](/doc/command-reference/metrics) to identify the best
experiment(s).
- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat
🔄
- Use `dvc exp apply` to roll back to the best one.
- Make the selected experiment persistent by committing its results to Git. This
cleans the slate so you can repeat the process.

## Checkpoints in source code

To track successive steps in a longer experiment, you can register checkpoints
from your code at runtime. This allows you, for example, to track the progress
in deep learning techniques such as evolving neural networks.
quantitative measures of your results.

This kind of experiments track a series of variations (the checkpoints) and its
execution can be stopped and resumed as needed. You interact with them using
`dvc exp run` and its `--rev`, `--reset` options (see also the `checkpoint`
field in `dvc.yaml` `outs`).
<details>

> 📖 To learn more, see the dedicated
> [Checkpoints](/doc/user-guide/experiment-management/checkpoints) guide.
## 💡 Run Cache: Automatic Log of Stage Runs

## Persistent experiments
Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds too much for the index page? and too abrupt to be honest (even though it is in the details section)

logs the unique signature of each stage run (in `.dvc/cache/runs` by default).
If it never happened before, the stage command(s) are executed normally. Every
subsequent time a [stage](/doc/command-reference/run) runs under the same
conditions, the previous results can be restored instantly, without wasting time
or computing resources.

When your experiments are good enough to save or share, you may want to store
them persistently as Git commits in your <abbr>repository</abbr>.
✅ This built-in feature is called <abbr>run-cache</abbr> and it can
dramatically improve performance. It's enabled out-of-the-box (can be disabled),
which means DVC is already saving all of your tests and experiments behind the
scene. But there's no easy way to explore it.

Whether the results were produced with `dvc repro` directly, or after a
`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock`
pair in the <abbr>workspace</abbr> will codify the experiment as a new project
version. The right <abbr>outputs</abbr> (including
[metrics](/doc/command-reference/metrics)) should also be present, or available
via `dvc checkout`.
</details>

### Organization patterns
## DVC Experiments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - it's clear that this is about dvc and about experiments - what is the intention of this subsection? what is the intention behind the index page?


DVC takes care of arranging `dvc exp` experiments and the data
<abbr>cache</abbr> under the hood. But when it comes to full-blown persistent
experiments, it's up to you to decide how to organize them in your project.
These are the main alternatives:
_New in DVC 2.0_

- **Git tags and branches** - use the repo's "time dimension" to distribute your
experiments. This makes the most sense for experiments that build on each
other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can
be easily visualized, for example with tools
[like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network).
- **Directories** - the project's "space dimension" can be structured with
directories (folders) to organize experiments. Useful when you want to see all
your experiments at the same time (without switching versions) by just
exploring the file system.
- **Hybrid** - combining an intuitive directory structure with a good repo
branching strategy tends to be the best option for complex projects.
Completely independent experiments live in separate directories, while their
progress can be found in different branches.
The `dvc experiments` features are designed to support these main approaches:

1. Create [experiments] that derive from your latest project version without
polluting your Git history. DVC tracks them for you, letting you list and
compare them. The best ones can be made persistent, and the rest left as
history or cleared.
1. [Queue] and process series of experiments based on a parameter search or
other modifications to your baseline.
1. Generate [checkpoints] during your code execution to analyze the internal
progress of deep experiments. DVC captures them at runtime, and can manage
them in batches.
1. Make experiments [persistent] by committing them to your
<abbr>repository</abbr> history.

[experiments]: /doc/user-guide/experiment-management/experiments
[queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution
[checkpoints]: /doc/user-guide/experiment-management/checkpoints
[persistent]:
/doc/user-guide/experiment-management/experiments#persistent-experiments

## Automatic log of stage runs (run-cache)
> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
> introduction to DVC experiments.

Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the
unique signature of each stage run (to `.dvc/cache/runs` by default). If it
never happened before, the stage command(s) are executed normally. Every
subsequent time a [stage](/doc/command-reference/run) runs under the same
conditions, the previous results can be restored instantly, without wasting time
or computing resources.
You may also want to consider the different [ways to organize experiments] in
your project (as Git branches, as folders, etc.).

✅ This built-in feature is called <abbr>run-cache</abbr> and it can
dramatically improve performance. It's enabled out-of-the-box (but can be
disabled with the `--no-run-cache` command option).
[ways to organize experiments]:
/doc/user-guide/experiment-management/organization
20 changes: 20 additions & 0 deletions content/docs/user-guide/experiment-management/organization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
### Organization Patterns

DVC takes care of arranging `dvc exp` experiments and the data
<abbr>cache</abbr> under the hood. But when it comes to full-blown persistent
experiments, it's up to you to decide how to organize them in your project.
These are the main alternatives:

- **Git tags and branches** - use the repo's "time dimension" to distribute your
experiments. This makes the most sense for experiments that build on each
other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can
be easily visualized, for example with tools
[like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network).
- **Directories** - the project's "space dimension" can be structured with
directories (folders) to organize experiments. Useful when you want to see all
your experiments at the same time (without switching versions) by just
exploring the file system.
- **Hybrid** - combining an intuitive directory structure with a good repo
branching strategy tends to be the best option for complex projects.
Completely independent experiments live in separate directories, while their
progress can be found in different branches.