Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: DVC Experiments Overview #2909

Merged
merged 50 commits into from
Dec 13, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5e43591
guide: add DVC Experiments page and links +
jorgeorpinel Oct 9, 2021
6b7300a
guide: remove checkpoint related changes
jorgeorpinel Oct 10, 2021
6027e15
guide: remove `dvc experiments` long cmd autolinks
jorgeorpinel Oct 10, 2021
7704a4d
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Oct 11, 2021
8f04899
guide: move run-cache section back to Exp Mgmt index bottom
jorgeorpinel Oct 11, 2021
3bfd2a9
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Nov 1, 2021
0c2bcf5
guide: Exp Mgmt/ DVC Exps -> Exps Overview
jorgeorpinel Nov 1, 2021
27afdc1
guide: clear separation between Exp Mgmt index and Overview page
jorgeorpinel Nov 2, 2021
30db819
guide: single guide for Persisting Exps content and
jorgeorpinel Nov 2, 2021
aa3c5d0
guide: begin extracting Exp details from Running to Overview
jorgeorpinel Nov 2, 2021
7710433
guide: make ToC entry for Run Cache section
jorgeorpinel Nov 2, 2021
a133f70
Update content/docs/user-guide/experiment-management/index.md
jorgeorpinel Nov 4, 2021
32a269f
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Nov 4, 2021
af94248
Merge branch 'master' into exp/dvc-exps-page +
jorgeorpinel Nov 17, 2021
dacaf85
[NESTED] guide: Exp implementation details, naming into Overview (#3006)
jorgeorpinel Nov 17, 2021
cab14da
Merge branch 'master' into exp/dvc-exps-page +
jorgeorpinel Nov 29, 2021
9a1e142
Merge branch 'exp/dvc-exps-page' of github.com:iterative/dvc.org into…
jorgeorpinel Nov 30, 2021
b40f340
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Nov 30, 2021
73175a9
guide: emphasize dvc exps are not part of Git tree in overview
jorgeorpinel Nov 30, 2021
112ad87
guide: ID->name in dvc exps overview
jorgeorpinel Nov 30, 2021
9c2a55c
guide: ID->name in other exp guides
jorgeorpinel Nov 30, 2021
9b2902a
guide: Visualize->Review in exp/overview/basic-workflow
jorgeorpinel Nov 30, 2021
7b9384f
guide: don't say "cleans the slate" in exp/overview/basic-workflow
jorgeorpinel Nov 30, 2021
c9493f4
giude: soften params description in exps index
jorgeorpinel Nov 30, 2021
42454f0
guide: generalize dvc exps basic workflow
jorgeorpinel Nov 30, 2021
bd95136
guide: Properties section in DVC Exps overview page
jorgeorpinel Nov 30, 2021
6162f5a
guide: exp init section in Exp Overview page
jorgeorpinel Nov 30, 2021
63a9864
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Dec 1, 2021
5043e64
guide: clarify dvc exp implementation
jorgeorpinel Dec 1, 2021
27f01e6
guide: expand on Exp Overview motivation
jorgeorpinel Dec 1, 2021
a799743
guide: direct language in Exp Overview/ workflow intro
jorgeorpinel Dec 1, 2021
59505f6
guide: mention metrics in exp init intro (Exp Overview)
jorgeorpinel Dec 1, 2021
3d0bede
guide: intro exp init before giving specific examples of what it does
jorgeorpinel Dec 1, 2021
db2d610
guide: hint forach stages for hybrid exp org pattern
jorgeorpinel Dec 1, 2021
f6eef79
guide: exp mgmt index copy edits
jorgeorpinel Dec 1, 2021
c68fc78
guide: mention label-based exp organization
jorgeorpinel Dec 1, 2021
3384af0
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Dec 7, 2021
9fd3b3a
guide: hide exp naming section in overview page and
jorgeorpinel Dec 7, 2021
f241901
guide: mention `exp init -i` in Overview
jorgeorpinel Dec 7, 2021
e122b0a
guide: typo fix
jorgeorpinel Dec 7, 2021
659dd82
Merge branch 'master' into exp/dvc-exps-page +
jorgeorpinel Dec 7, 2021
73d510d
ref: exp apply copy edits
jorgeorpinel Dec 7, 2021
9d43ca6
ref: mention init before exp init
jorgeorpinel Dec 7, 2021
24c967d
guide: correct info aboug exp init in Exp Overview
jorgeorpinel Dec 7, 2021
439050e
ref: link from exp init to corresponding guide
jorgeorpinel Dec 7, 2021
3af2f9a
guide: make exp intro more concrete
jorgeorpinel Dec 8, 2021
12f8797
guide: rewrite exp init section of Exps Overview page
jorgeorpinel Dec 8, 2021
ad652a6
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Dec 10, 2021
8aed622
ref: roll back unrelated ref changes (moved to ref/exp-misc)
jorgeorpinel Dec 10, 2021
c088a06
guide: roll back unrelated changes (moved to #3080)
jorgeorpinel Dec 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/command-reference/exp/apply.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ can be referenced by name or hash (see `dvc exp run` for details).

This is typically used after choosing a target `experiment` with `dvc exp show`
or `dvc exp diff`, and before committing it to Git (making it
[persistent](/doc/user-guide/experiment-management#persistent-experiments)).
[persistent](/doc/user-guide/experiment-management/dvc-experiments#persistent-experiments)).

`dvc exp apply` changes any files (code, data, <abbr>parameters</abbr>,
<abbr>metrics</abbr>, etc.) needed to reflect the experiment conditions and
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/exp/branch.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ positional arguments:
Makes a named Git
[`branch`](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging)
containing the target `experiment` (making it
[persistent](/doc/user-guide/experiment-management#persistent-experiments)). For
[checkpoint experiments](/doc/command-reference/exp/run#checkpoints), the new
branch will contain multiple commits (the checkpoints).
[persistent](/doc/user-guide/experiment-management/dvc-experiments#persistent-experiments)).
For [checkpoint experiments](/doc/command-reference/exp/run#checkpoints), the
new branch will contain multiple commits (the checkpoints).

The new `branch` will be based on the experiment's parent commit (`HEAD` at the
time that the experiment was run). Note that DVC **does not** switch into the
Expand Down
14 changes: 7 additions & 7 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ Provides a way to execute and track <abbr>experiments</abbr> in your
<abbr>project</abbr> without polluting it with unnecessary commits, branches,
directories, etc.

> `dvc exp run` is equivalent to `dvc repro` for experiments. It has the same
> behavior when it comes to `targets` and stage execution (restores the
> dependency graph, etc.). See the command [options](#options) for more on the
> differences.
> `dvc exp run` is equivalent to `dvc repro` for <abbr>experiments</abbr>. It
> has the same behavior when it comes to `targets` and stage execution (restores
> the dependency graph, etc.). See the command [options](#options) for more on
> the differences.

Before running an experiment, you'll probably want to make modifications such as
data and code updates, or <abbr>hyperparameter</abbr> tuning. For the latter,
Expand All @@ -44,7 +44,7 @@ option.
Experiments are custom
[Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
(found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked
out by DVC). Note that these commits are not pushed to the Git remote by default
out by DVC). Note that these commits are not pushed to Git remotes by default
(see `dvc exp push`).

</details>
Expand All @@ -55,8 +55,8 @@ and compare multiple experiments, use `dvc exp show` or `dvc exp diff`
to restore the results of any other experiment instead.

Successful experiments can be made
[persistent](/doc/user-guide/experiment-management#persistent-experiments) by
committing them to the Git repo. Unnecessary ones can be removed with
[persistent](/doc/user-guide/experiment-management/dvc-experiments#persistent-experiments)
by committing them to the Git repo. Unnecessary ones can be removed with
`dvc exp remove`or `dvc exp gc` (or abandoned).

> Note that experiment data will remain in the <abbr>cache</abbr> until you use
Expand Down
1 change: 1 addition & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@
"slug": "experiment-management",
"source": "experiment-management/index.md",
"children": [
"dvc-experiments",
"running-experiments",
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
"sharing-experiments",
"cleaning-experiments",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

Although DVC uses minimal resources to keep track of the experiments, they may
clutter tables and the workspace. DVC allows to remove specific experiments from
the workspace or delete all not-yet-persisted experiments at once.
the workspace or delete all not-yet-[persisted] experiments at once.

[persisted]:
/doc/user-guide/experiment-management/dvc-experiments#persistent-experiments

## Removing specific experiments

Expand Down
36 changes: 36 additions & 0 deletions content/docs/user-guide/experiment-management/dvc-experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
## DVC Experiments

_New in DVC 2.0_
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

`dvc exp` commands let you automatically track a variation to an established
[data pipeline](/doc/command-reference/dag) baseline. You can create multiple
isolated experiments this way, as well as review, compare, and restore them
later, or roll back to the baseline. The basic workflow goes like this:

- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data,
source code) of committed stages.
- [Run experiments] with `dvc exp run` (instead of `repro`) to execute the
pipeline. The results are reflected in your <abbr>workspace</abbr>, and
tracked automatically.
- Use `dvc metrics` to identify the best experiment(s).
- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat
🔄
- Use `dvc exp apply` to roll back to the best one.
- Make the selected experiment persistent by committing its results to Git. This
cleans the slate so you can repeat the process.

[run experiments]: /doc/user-guide/experiment-management/running-experiments

## Persistent Experiments

When your experiments are good enough to save or share, you may want to store
them persistently as Git commits in your <abbr>repository</abbr>.

Whether the results were produced with `dvc repro` directly, or after a
`dvc exp` workflow, `dvc.yaml` and `dvc.lock` will define the experiment as a
new project version. The right <abbr>outputs</abbr> (including
[metrics](/doc/command-reference/metrics)) should also be present, or available
via `dvc checkout`.

Use `dvc exp apply` and `dvc exp branch` to persist experiments in your Git
history.
134 changes: 54 additions & 80 deletions content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,72 @@
# Experiment Management

_New in DVC 2.0_

Data science and ML are iterative processes that require a large number of
attempts to reach a certain level of a metric. Experimentation is part of the
development of data features, hyperspace exploration, deep learning
optimization, etc. DVC helps you codify and manage all of your
<abbr>experiments</abbr>, supporting these main approaches:

1. Create [experiments](#experiments) that derive from your latest project
version without having to track them manually. DVC does that automatically,
letting you list and compare them. The best ones can be made persistent, and
the rest archived.
2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series
of variations, forming a deep experiment. DVC helps you capture them at
runtime, and manage them in batches.
3. Make experiments or checkpoints [persistent](#persistent-experiments) by
committing them to your <abbr>repository</abbr>. Or create these versions
from scratch like typical project changes.

At this point you may also want to consider the different
[ways to organize](#organization-patterns) experiments in your project (as
Git branches, as folders, etc.).

DVC also provides specialized features to codify and analyze experiments.
optimization, etc.

Some of DVC's base features already help you codify and analyze experiments.
[Parameters](/doc/command-reference/params) are simple values you can tweak in a
human-readable text file, which cause different behaviors in your code and
models. On the other end, [metrics](/doc/command-reference/metrics) (and
formatted text file; They cause different behaviors in your code and models. On
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cause different behaviors" -> "may modify the results" or something softer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to "simple values in a formatted text file which you can tweak and use in your code". WDYT @iesahin ?

the other end, [metrics](/doc/command-reference/metrics) (and
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[plots](/doc/command-reference/plots)) let you define, visualize, and compare
meaningful measures for the experimental results.

> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
> introduction to DVC experiments.
quantitative measures of your results.

## Experiments
<details>

`dvc exp` commands let you automatically track a variation to an established
[data pipeline](/doc/command-reference/dag). You can create multiple isolated
experiments this way, as well as review, compare, and restore them later, or
roll back to the baseline. The basic workflow goes like this:
## 💡 Run Cache: Automatic Log of Stage Runs

- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data,
source code) of committed stages.
- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results
are reflected in your <abbr>workspace</abbr>, and tracked automatically.
- Use [metrics](/doc/command-reference/metrics) to identify the best
experiment(s).
- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat
🔄
- Use `dvc exp apply` to roll back to the best one.
- Make the selected experiment persistent by committing its results to Git. This
cleans the slate so you can repeat the process.

## Checkpoints in source code
Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it
logs the unique signature of each stage run (in `.dvc/cache/runs` by default).
If it never happened before, the stage command(s) are executed normally. Every
subsequent time a [stage](/doc/command-reference/run) runs under the same
conditions, the previous results can be restored instantly, without wasting time
or computing resources.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

To track successive steps in a longer experiment, you can register checkpoints
from your code at runtime. This allows you, for example, to track the progress
in deep learning techniques such as evolving neural networks.
✅ This built-in feature is called <abbr>run-cache</abbr> and it can
dramatically improve performance. It's enabled out-of-the-box (can be disabled),
which means DVC is already saving all of your tests and experiments behind the
scene. But there's no easy way to explore it.

This kind of experiments track a series of variations (the checkpoints) and its
execution can be stopped and resumed as needed. You interact with them using
`dvc exp run` and its `--rev`, `--reset` options (see also the `checkpoint`
field in `dvc.yaml` `outs`).
</details>

> 📖 To learn more, see the dedicated
> [Checkpoints](/doc/user-guide/experiment-management/checkpoints) guide.
## DVC Experiments

## Persistent experiments
_New in DVC 2.0_
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

When your experiments are good enough to save or share, you may want to store
them persistently as Git commits in your <abbr>repository</abbr>.
DVC experiment management features are designed to support these main
approaches:

1. [Run] and capture [experiments] that derive from your latest project version
without polluting your Git history. DVC tracks them for you, letting you list
and compare them. The best ones can be made persistent, and the rest left as
history or cleared.
1. [Queue] and process series of experiments based on a parameter search or
other modifications to your baseline.
1. Generate [checkpoints] during your code execution to analyze the internal
progress of deep experiments. DVC captures them at runtime, and can manage
them in batches.
1. Make experiments [persistent] by committing them to your
<abbr>repository</abbr> history.

[run]: /doc/user-guide/experiment-management/running-experiments
[experiments]: /doc/user-guide/experiment-management/dvc-experiments
[queue]:
/doc/user-guide/experiment-management/running-experiments#the-experiments-queue
[checkpoints]: /doc/user-guide/experiment-management/checkpoints
[persistent]:
/doc/user-guide/experiment-management/dvc-experiments#persistent-experiments

📖 More information in the
[full guide](/doc/user-guide/experiment-management/dvc-experiments).

Whether the results were produced with `dvc repro` directly, or after a
`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock`
pair in the <abbr>workspace</abbr> will codify the experiment as a new project
version. The right <abbr>outputs</abbr> (including
[metrics](/doc/command-reference/metrics)) should also be present, or available
via `dvc checkout`.
> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
> introduction to DVC experiments.

### Organization patterns
### Organization Patterns
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

DVC takes care of arranging `dvc exp` experiments and the data
<abbr>cache</abbr> under the hood. But when it comes to full-blown persistent
experiments, it's up to you to decide how to organize them in your project.
These are the main alternatives:
It's up to you to decide how to organize completed experiments. These are the
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #2654 (review)

should it be somewhere inside? (not on the index page)?

@shcheklein maybe, let's decide... But not in scope for this PR probably?

main alternatives:

- **Git tags and branches** - use the repo's "time dimension" to distribute your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dvc exp commands work towards this organization pattern, not separate directories. IMO we can modify this section to describe the organization pattern DVC leads. We don't have much facility to use "space dimension" for experiments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index tries to cover the traditional experiments as well. It's not exclusively about dvc exp until you get to the sub-pages. That's one of the reasons for creating a separate DVC Experiments overview page in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look into the text, I think DVC helps to organize the experiments in "space dimension" as well. What DVC does is better IMO, but mentioning these organization patterns seems to remind the reader a feature DVC lacks.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to improve on this indeed so it's clear that we're separating manual exp tracking you can do on DVC projects vs. the DVC Experiments workflow.
UPDATE: Wait I was confusing this discussion with #2909 (review)...

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look into the text, I think DVC helps to organize the experiments in "space dimension" as well.

It can @iesahin, for example multiple via dvc.yaml files (copy/pasted + small changes) or via init --subrepo (monorepo structure). Both supported by DVC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. this makes me thing about another route: using foreach stages to quickly define multiple experiments based on a params file and running them all (in parallel). I guess it's a pre-exp way to manage experiments with DVC, but in which you can see all the results at once in your workspace (may be messy unless you create a bunch of subdirectories so perhaps it's the same as the "space dimension")...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.p.s I added a section about custom labels as well (for org pattern) based on this table. See c68fc78

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can @iesahin, for example multiple via dvc.yaml files (copy/pasted + small changes) or via init --subrepo (monorepo structure). Both supported by DVC

The dvc exp workflow is orthogonal to these features, they are not alternatives to each other. One can use multiple dvc.yaml files with dvc exp as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are conceptual alternatives which you can combine, which is already stated in the text.

experiments. This makes the most sense for experiments that build on each
Expand All @@ -99,15 +82,6 @@ These are the main alternatives:
Completely independent experiments live in separate directories, while their
progress can be found in different branches.

## Automatic log of stage runs (run-cache)

Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the
unique signature of each stage run (to `.dvc/cache/runs` by default). If it
never happened before, the stage command(s) are executed normally. Every
subsequent time a [stage](/doc/command-reference/run) runs under the same
conditions, the previous results can be restored instantly, without wasting time
or computing resources.

✅ This built-in feature is called <abbr>run-cache</abbr> and it can
dramatically improve performance. It's enabled out-of-the-box (but can be
disabled with the `--no-run-cache` command option).
DVC takes care of arranging `dvc exp` experiments and the data
<abbr>cache</abbr> under the hood so there's no need to decide on the above
until your experiments are made [persistent].
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,8 @@ Note that Git-ignored files/dirs are explicitly excluded from queued/temp runs
to avoid committing unwanted files into Git (e.g. once successful experiments
are [persisted]).

[persisted]: /doc/user-guide/experiment-management#persistent-experiments
[persisted]:
/doc/user-guide/experiment-management/dvc-experiments#persistent-experiments

> 💡 To include untracked files, stage them with `git add` first (before
> `dvc exp run`) and `git reset` them afterwards.
Expand Down