-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: expand Experiments guide #2654
Changes from 6 commits
5008c55
ff85352
923040f
3ae85e5
29b17b2
e21fef4
c21dbe3
d8f2d7c
1337453
8d93521
90f3042
fb4663c
d1422b1
532df56
0581991
e6d4eca
ec2ac41
2e4a512
e4f4024
581a9a9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
--- | ||
name: Experiment | ||
match: [experiment, experiments] | ||
match: [experiment, experiments, 'DVC experiments'] | ||
tooltip: >- | ||
An attempt to reach desired/better/interesting results during data pipelining | ||
or ML model development. DVC is designed to help [manage | ||
experiments](/doc/start/experiments), having [built-in | ||
mechanisms](/doc/user-guide/experiment-management) like the | ||
[run-cache](/doc/user-guide/project-structure/internal-files#run-cache) and | ||
the `dvc experiments` commands (available on DVC 2.0 and above). | ||
the [`dvc experiments`](/doc/command-reference/exp) commands (available on DVC | ||
2.0 and above). | ||
--- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,32 @@ | ||
# Checkpoints | ||
|
||
ML checkpoints are an important part of deep learning because ML engineers like | ||
to save the model files at certain points during a training process. | ||
_New in DVC 2.0_ | ||
|
||
With DVC experiments and checkpoints, you can: | ||
To track successive steps in a longer experiment, you can register checkpoints | ||
from your code at runtime. This is especially helpful in machine learning, for | ||
example to track the progress in deep learning techniques such as evolving | ||
neural networks. | ||
|
||
- Implement the best practice in deep learning to save your model weights as | ||
_Checkpoint experiments_ track a series of variations (the checkpoints) and | ||
their execution can be stopped and resumed as needed. You interact with them | ||
using the `--rev` and `--reset` options of `dvc exp run` (see also the | ||
`checkpoint` field in `dvc.yaml` `outs`). They can help you | ||
|
||
- implement the best practice in deep learning to save your model weights as | ||
checkpoints. | ||
- Track all code and data changes corresponding to the checkpoints. | ||
- See when metrics start diverging and revert to the optimal checkpoint. | ||
- Automate the process of tracking every training epoch. | ||
- track all code and data changes corresponding to the checkpoints. | ||
- see when metrics start diverging and revert to the optimal checkpoint. | ||
- automate the process of tracking every training epoch. | ||
|
||
[The way checkpoints are implemented by DVC](/blog/experiment-refs) utilizes | ||
_ephemeral_ experiment commits and experiment branches within DVC. They are | ||
created using the metadata from experiments and are tracked with the `exps` | ||
custom Git reference. | ||
> Experiments and checkpoints are [implemented](/blog/experiment-refs) with | ||
> hidden Git experiment commits branches. | ||
|
||
You can add experiments to your Git history by committing the experiment you | ||
want to track, which you'll see later in this tutorial. | ||
Like with regular experiments, checkpoints can become persistent by | ||
[committing them to Git](#committing-checkpoints-to-git). | ||
|
||
This tutorial is going to cover how to implement checkpoints in an ML project | ||
using DVC. We're going to train a model to identify handwritten digits based on | ||
the MNIST dataset. | ||
This guide covers how to implement checkpoints in an ML project using DVC. We're | ||
going to train a model to identify handwritten digits based on the MNIST | ||
dataset. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not this PR: below: Setting up the project ... should we do
This comment was marked as resolved.
Sorry, something went wrong. |
||
|
||
<details> | ||
|
||
|
@@ -62,9 +67,9 @@ everything you need to get started with experiments and checkpoints. | |
|
||
## Setting up a DVC pipeline | ||
|
||
DVC versions data and it also can version the machine learning model weights | ||
file as checkpoints during the training process. To enable this, you will need | ||
to set up a DVC pipeline to train your model. | ||
DVC versions data and it also can version the ML model weights file as | ||
checkpoints during the training process. To enable this, you will need to set up | ||
a DVC pipeline to train your model. | ||
|
||
Adding a DVC pipeline only takes a few commands. At the root of the project, | ||
run: | ||
|
@@ -190,9 +195,8 @@ You can read about what the line `dvclive.log(k, v)` does in the | |
|
||
The [`dvclive.next_step()`](/doc/dvclive/api-reference/next_step) line tells DVC | ||
that it can take a snapshot of the entire workspace and version it with Git. | ||
It's important that with this approach only code with metadata is versioned in | ||
Git (as an ephemeral commit), while the actual model weight file will be stored | ||
in the DVC data cache. | ||
It's important that with this approach only code with metadata is versioned, | ||
while the actual model weight file will be stored in the DVC data cache. | ||
|
||
## Running experiments | ||
|
||
|
@@ -407,39 +411,25 @@ new set of checkpoints under a new experiment branch. | |
└─────────────────────────┴──────────┴──────┴─────────┴────────┴────────┴────────┴──────────────┘ | ||
``` | ||
|
||
## Adding checkpoints to Git | ||
## Committing checkpoints to Git | ||
|
||
When you terminate training, you'll see a few commands in the terminal that will | ||
allow you to add these changes to Git. | ||
allow you to add these changes to Git, making them persistent: | ||
|
||
``` | ||
```dvc | ||
To track the changes with git, run: | ||
|
||
git add dvclive.json dvc.yaml .gitignore train.py dvc.lock | ||
|
||
Reproduced experiment(s): exp-263da | ||
Experiment results have been applied to your workspace. | ||
|
||
To promote an experiment to a Git branch run: | ||
|
||
dvc exp branch <exp> | ||
``` | ||
|
||
You can run the following command to save your experiments to the Git history. | ||
|
||
```bash | ||
$ git add dvclive.json dvc.yaml .gitignore train.py dvc.lock | ||
... | ||
``` | ||
|
||
You can take a look at what will be committed to your Git history by running: | ||
Running the command above will stage the checkpoint experiment with Git. You can | ||
take a look at what would be committed first with `git status`. You should see | ||
something similar to this in your terminal: | ||
|
||
```bash | ||
```dvc | ||
$ git status | ||
``` | ||
|
||
You should see something similar to this in your terminal. | ||
|
||
``` | ||
Changes to be committed: | ||
(use "git restore --staged <file>..." to unstage) | ||
new file: .gitignore | ||
|
@@ -456,7 +446,7 @@ Untracked files: | |
predictions.json | ||
``` | ||
|
||
All that's left is to commit these changes with the following command: | ||
All that's left to do is to `git commit` the changes: | ||
|
||
```bash | ||
$ git commit -m 'saved files from experiment' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
## Experiments | ||
|
||
_New in DVC 2.0_ | ||
|
||
`dvc exp` commands let you automatically track a variation to an established | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not for this PR:
or even better - get rid of this sentence :) It does more harm than good to my mind. |
||
[data pipeline](/doc/command-reference/dag). You can create multiple isolated | ||
experiments this way, as well as review, compare, and restore them later, or | ||
roll back to the baseline. The basic workflow goes like this: | ||
|
||
- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not for this PR: Modify hyperparameters (link to params), code ... etc ... we are being way too formal here again. I thinks we can sacrifice it a bit and use some common ML terminology |
||
source code) of committed stages. | ||
- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results | ||
are reflected in your <abbr>workspace</abbr>, and tracked automatically. | ||
- Use `dvc metrics` to identify the best experiment(s). | ||
- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat | ||
🔄 | ||
- Use `dvc exp apply` to roll back to the best one. | ||
- Make the selected experiment persistent by committing its results to Git. This | ||
cleans the slate so you can repeat the process. | ||
|
||
## Persistent Experiments | ||
|
||
When your experiments are good enough to save or share, you may want to store | ||
them persistently as Git commits in your <abbr>repository</abbr>. | ||
|
||
Whether the results were produced with `dvc repro` directly, or after a | ||
`dvc exp` workflow, `dvc.yaml` and `dvc.lock` will define the experiment as a | ||
new project version. The right <abbr>outputs</abbr> (including | ||
[metrics](/doc/command-reference/metrics)) should also be present, or available | ||
via `dvc checkout`. | ||
|
||
Use `dvc exp apply` and `dvc exp branch` to persist experiments in your Git | ||
history. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,113 +1,64 @@ | ||
# Experiment Management | ||
|
||
_New in DVC 2.0_ | ||
|
||
Data science and ML are iterative processes that require a large number of | ||
attempts to reach a certain level of a metric. Experimentation is part of the | ||
development of data features, hyperspace exploration, deep learning | ||
optimization, etc. DVC helps you codify and manage all of your | ||
<abbr>experiments</abbr>, supporting these main approaches: | ||
|
||
1. Create [experiments](#experiments) that derive from your latest project | ||
version without having to track them manually. DVC does that automatically, | ||
letting you list and compare them. The best ones can be made persistent, and | ||
the rest archived. | ||
2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series | ||
of variations, forming a deep experiment. DVC helps you capture them at | ||
runtime, and manage them in batches. | ||
3. Make experiments or checkpoints [persistent](#persistent-experiments) by | ||
committing them to your <abbr>repository</abbr>. Or create these versions | ||
from scratch like typical project changes. | ||
optimization, etc. | ||
|
||
At this point you may also want to consider the different | ||
[ways to organize](#organization-patterns) experiments in your project (as | ||
Git branches, as folders, etc.). | ||
|
||
DVC also provides specialized features to codify and analyze experiments. | ||
Some of DVC's base features already help you codify and analyze experiments. | ||
[Parameters](/doc/command-reference/params) are simple values you can tweak in a | ||
human-readable text file, which cause different behaviors in your code and | ||
models. On the other end, [metrics](/doc/command-reference/metrics) (and | ||
formatted text file; They cause different behaviors in your code and models. On | ||
the other end, [metrics](/doc/command-reference/metrics) (and | ||
[plots](/doc/command-reference/plots)) let you define, visualize, and compare | ||
meaningful measures for the experimental results. | ||
|
||
> 👨💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on | ||
> introduction to DVC experiments. | ||
|
||
## Experiments | ||
|
||
`dvc exp` commands let you automatically track a variation to an established | ||
[data pipeline](/doc/command-reference/dag). You can create multiple isolated | ||
experiments this way, as well as review, compare, and restore them later, or | ||
roll back to the baseline. The basic workflow goes like this: | ||
|
||
- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data, | ||
source code) of committed stages. | ||
- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results | ||
are reflected in your <abbr>workspace</abbr>, and tracked automatically. | ||
- Use [metrics](/doc/command-reference/metrics) to identify the best | ||
experiment(s). | ||
- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat | ||
🔄 | ||
- Use `dvc exp apply` to roll back to the best one. | ||
- Make the selected experiment persistent by committing its results to Git. This | ||
cleans the slate so you can repeat the process. | ||
|
||
## Checkpoints in source code | ||
|
||
To track successive steps in a longer experiment, you can register checkpoints | ||
from your code at runtime. This allows you, for example, to track the progress | ||
in deep learning techniques such as evolving neural networks. | ||
quantitative measures of your results. | ||
|
||
This kind of experiments track a series of variations (the checkpoints) and its | ||
execution can be stopped and resumed as needed. You interact with them using | ||
`dvc exp run` and its `--rev`, `--reset` options (see also the `checkpoint` | ||
field in `dvc.yaml` `outs`). | ||
<details> | ||
|
||
> 📖 To learn more, see the dedicated | ||
> [Checkpoints](/doc/user-guide/experiment-management/checkpoints) guide. | ||
## 💡 Run Cache: Automatic Log of Stage Runs | ||
|
||
## Persistent experiments | ||
Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds too much for the index page? and too abrupt to be honest (even though it is in the details section) |
||
logs the unique signature of each stage run (in `.dvc/cache/runs` by default). | ||
If it never happened before, the stage command(s) are executed normally. Every | ||
subsequent time a [stage](/doc/command-reference/run) runs under the same | ||
conditions, the previous results can be restored instantly, without wasting time | ||
or computing resources. | ||
|
||
When your experiments are good enough to save or share, you may want to store | ||
them persistently as Git commits in your <abbr>repository</abbr>. | ||
✅ This built-in feature is called <abbr>run-cache</abbr> and it can | ||
dramatically improve performance. It's enabled out-of-the-box (can be disabled), | ||
which means DVC is already saving all of your tests and experiments behind the | ||
scene. But there's no easy way to explore it. | ||
|
||
Whether the results were produced with `dvc repro` directly, or after a | ||
`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock` | ||
pair in the <abbr>workspace</abbr> will codify the experiment as a new project | ||
version. The right <abbr>outputs</abbr> (including | ||
[metrics](/doc/command-reference/metrics)) should also be present, or available | ||
via `dvc checkout`. | ||
</details> | ||
|
||
### Organization patterns | ||
## DVC Experiments | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here - it's clear that this is about dvc and about experiments - what is the intention of this subsection? what is the intention behind the index page? |
||
|
||
DVC takes care of arranging `dvc exp` experiments and the data | ||
<abbr>cache</abbr> under the hood. But when it comes to full-blown persistent | ||
experiments, it's up to you to decide how to organize them in your project. | ||
These are the main alternatives: | ||
_New in DVC 2.0_ | ||
|
||
- **Git tags and branches** - use the repo's "time dimension" to distribute your | ||
experiments. This makes the most sense for experiments that build on each | ||
other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can | ||
be easily visualized, for example with tools | ||
[like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). | ||
- **Directories** - the project's "space dimension" can be structured with | ||
directories (folders) to organize experiments. Useful when you want to see all | ||
your experiments at the same time (without switching versions) by just | ||
exploring the file system. | ||
- **Hybrid** - combining an intuitive directory structure with a good repo | ||
branching strategy tends to be the best option for complex projects. | ||
Completely independent experiments live in separate directories, while their | ||
progress can be found in different branches. | ||
The `dvc experiments` features are designed to support these main approaches: | ||
|
||
1. Create [experiments] that derive from your latest project version without | ||
polluting your Git history. DVC tracks them for you, letting you list and | ||
compare them. The best ones can be made persistent, and the rest left as | ||
history or cleared. | ||
1. [Queue] and process series of experiments based on a parameter search or | ||
other modifications to your baseline. | ||
1. Generate [checkpoints] during your code execution to analyze the internal | ||
progress of deep experiments. DVC captures them at runtime, and can manage | ||
them in batches. | ||
1. Make experiments [persistent] by committing them to your | ||
<abbr>repository</abbr> history. | ||
|
||
[experiments]: /doc/user-guide/experiment-management/experiments | ||
[queue]: /doc/command-reference/exp/run#queueing-and-parallel-execution | ||
[checkpoints]: /doc/user-guide/experiment-management/checkpoints | ||
[persistent]: | ||
/doc/user-guide/experiment-management/experiments#persistent-experiments | ||
|
||
## Automatic log of stage runs (run-cache) | ||
> 👨💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on | ||
> introduction to DVC experiments. | ||
|
||
Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the | ||
unique signature of each stage run (to `.dvc/cache/runs` by default). If it | ||
never happened before, the stage command(s) are executed normally. Every | ||
subsequent time a [stage](/doc/command-reference/run) runs under the same | ||
conditions, the previous results can be restored instantly, without wasting time | ||
or computing resources. | ||
You may also want to consider the different [ways to organize experiments] in | ||
your project (as Git branches, as folders, etc.). | ||
|
||
✅ This built-in feature is called <abbr>run-cache</abbr> and it can | ||
dramatically improve performance. It's enabled out-of-the-box (but can be | ||
disabled with the `--no-run-cache` command option). | ||
[ways to organize experiments]: | ||
/doc/user-guide/experiment-management/organization |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
### Organization Patterns | ||
|
||
DVC takes care of arranging `dvc exp` experiments and the data | ||
<abbr>cache</abbr> under the hood. But when it comes to full-blown persistent | ||
experiments, it's up to you to decide how to organize them in your project. | ||
These are the main alternatives: | ||
|
||
- **Git tags and branches** - use the repo's "time dimension" to distribute your | ||
experiments. This makes the most sense for experiments that build on each | ||
other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can | ||
be easily visualized, for example with tools | ||
[like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). | ||
- **Directories** - the project's "space dimension" can be structured with | ||
directories (folders) to organize experiments. Useful when you want to see all | ||
your experiments at the same time (without switching versions) by just | ||
exploring the file system. | ||
- **Hybrid** - combining an intuitive directory structure with a good repo | ||
branching strategy tends to be the best option for complex projects. | ||
Completely independent experiments live in separate directories, while their | ||
progress can be found in different branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
organization -> Organizing Project ?
organization is way to abstract ... essentially it's about how do you structure the workflow?
to be honest very hard to tell if the existing structure makes sense - would be great to see more content that you have in mind for each section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged the Organization Patterns section back into the index so it has more context. If we get more content for this we can split and rename.