-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: DVC Experiments Overview #2909
Changes from 45 commits
5e43591
6b7300a
6027e15
7704a4d
8f04899
3bfd2a9
0c2bcf5
27afdc1
30db819
aa3c5d0
7710433
a133f70
32a269f
af94248
dacaf85
cab14da
9a1e142
b40f340
73175a9
112ad87
9c2a55c
9b2902a
7b9384f
c9493f4
42454f0
bd95136
6162f5a
63a9864
5043e64
27f01e6
a799743
59505f6
3d0bede
db2d610
f6eef79
c68fc78
3384af0
9fd3b3a
f241901
e122b0a
659dd82
73d510d
9d43ca6
24c967d
439050e
3af2f9a
12f8797
ad652a6
8aed622
c088a06
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# DVC Experiments Overview | ||
|
||
DVC Experiments are captured automatically by DVC when [run]. Each experiment | ||
creates and tracks a variation of your data science project based on the changes | ||
in your <abbr>workspace</abbr>. | ||
|
||
Experiments preserve a connection to the latest commit in the current branch | ||
(Git `HEAD`) as their parent or _baseline_, but do not form part of the regular | ||
Git tree or workflow (unless you make them [persistent]). This prevents | ||
polluting Git namespaces and bloating the repo unnecessarily. | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[run]: /doc/user-guide/experiment-management/running-experiments | ||
|
||
<details> | ||
|
||
### ⚙️ How does DVC track experiments? | ||
|
||
Experiments are custom [Git references](/blog/experiment-refs) (found in | ||
`.git/refs/exps`) with one or more commits based on `HEAD`. These commits are | ||
hidden and not checked out by DVC. Note that these are not pushed to Git remotes | ||
by default either (see `dvc exp push`). | ||
|
||
Note that DVC Experiments require a unique name to identify them. DVC will | ||
usually auto-generate one by default, such as `exp-bfe64` (based on the | ||
experiment's hash). A custom name can be set instead, using the `--name`/`-n` | ||
option of `dvc exp run`. These names can be used to reference experiments in | ||
other `dvc exp` subcommands. | ||
|
||
</details> | ||
|
||
## Basic workflow | ||
|
||
`dvc exp` commands let you automatically track a variation of a project version | ||
(the baseline). You can create independent groups of experiments this way, as | ||
well as review, compare, and restore them later. The basic workflow goes like | ||
this: | ||
|
||
- Modify hyperparameters or other dependencies (input data, source code, | ||
commands to execute, etc.). Leave these changes un-committed in Git. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might help to discuss why we have this workflow and want to leave changes un-committed. In other experiment tracking tools, the workflow looks like:
This creates a confusing state where the experiment should really be associated with the second commit instead of the first. It might be too much detail or inappropriate for the page, but maybe it can be summarized, or it might spark other ideas. This is probably a good point for the blog post but not sure if there's a place for it... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah this needs more work. I'm hoping for now it's mergeable but I'll improve is as much as possible ⌛
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
Sorry, something went wrong. |
||
- [Run experiments][run] with `dvc exp run` (instead of `repro`). The results | ||
are reflected in your <abbr>workspace</abbr>, and tracked automatically. | ||
- Review and [compare] experiments with `dvc exp show` or `dvc exp diff`, using | ||
[metrics](/doc/command-reference/metrics) to identify the best one(s). Repeat | ||
🔄 | ||
- Make certain experiments [persistent] by committing their results to Git. This | ||
lets you repeat the process from that point. | ||
|
||
[pipeline]: /doc/user-guide/project-structure/pipelines-files | ||
[compare]: /doc/user-guide/experiment-management/comparing-experiments | ||
[persistent]: /doc/user-guide/experiment-management/persisting-experiments | ||
|
||
## Initialize DVC Experiments on any project | ||
|
||
DVC Experiments build on basic semantics of <abbr>DVC projects</abbr>. This | ||
means that minimal formalities are required. | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`dvc exp init` lets you quickly onboard an existing data science project to use | ||
DVC Experiments, without having to worry about bootstrapping DVC manually. You | ||
can either supply a `command` to execute your experiments or use the | ||
`--interactive` flag (`-i`) to be prompted for that and other optional | ||
customizations. | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This creates a simple `dvc.yaml` file for you. It uses sane default locations | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for your project's <abbr>dependencies</abbr> (data, parameters, source code) and | ||
<abbr>outputs</abbr> (ML models or other artifacts, <abbr>metrics</abbr>, etc.) | ||
-- which you can customize via `-i` or other options of `dvc exp init`. | ||
|
||
You can review the results (and commit them to Git) to begin using DVC | ||
Experiments. Now you can move on to [running your experiments][run] (next). | ||
|
||
[codify a pipeline]: /doc/user-guide/project-structure/pipelines-files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be a good page (still not clear if need a separate one for this though, considering that we have index)
should we do a diagram here with the basic workflow?
should we include cleaning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good ideas. Not sure about Cleaning Exps in here (you need to know how to make them first?) but a diagram for the workflow would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since finalizing that properly may involve including alternative paths (a sort of flow chart) and design work, I vote to make it a follow-up issue (tied to the Exp Versioning release cc @dberenbaum) so we can merge this, if the content is approved in general.