Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters: separate info among docs sections #3899

Merged
merged 105 commits into from
Sep 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
f91f015
initial plan and some content
Apr 5, 2022
fd20226
added content about stages
Apr 6, 2022
b279db7
title and restyle fixes
iesahin Apr 13, 2022
881f3af
added dag section
iesahin Apr 15, 2022
0e95212
depend to -> depend on
iesahin Apr 15, 2022
78f342d
added dependencies section
iesahin Apr 20, 2022
b2a949d
Update content/docs/user-guide/pipelines/index.md
jorgeorpinel May 9, 2022
9d5ee22
Update content/docs/user-guide/pipelines/index.md
jorgeorpinel May 9, 2022
678753d
Restyled by prettier (#3532)
restyled-io[bot] May 12, 2022
7fe19e7
Update content/docs/user-guide/pipelines/index.md
iesahin Jun 7, 2022
e932c8d
added pipelines to sidebar
iesahin Jun 7, 2022
baf6ce1
updated the title
iesahin Jun 7, 2022
caf3291
fixed formatting
iesahin Jun 7, 2022
2e98221
updating for dvc.yaml first
iesahin Jun 9, 2022
15ba67f
fed -> used
iesahin Jun 9, 2022
d749c2a
dvc.yaml-first
iesahin Jun 9, 2022
f01e750
editing to tell dvc.yaml first
iesahin Jun 9, 2022
78cfebd
minor fix
iesahin Jun 9, 2022
9989902
url dependency
iesahin Jun 9, 2022
f51bfd5
dvc lock example
iesahin Jun 9, 2022
5edc29b
section titles for deps
iesahin Jun 9, 2022
9649c0f
section titles for outputs
iesahin Jun 9, 2022
bb7c0a1
reproduction -> running
iesahin Jun 9, 2022
e26734e
adding hyperparameters section
iesahin Jun 9, 2022
424a196
added experiments section
iesahin Jun 14, 2022
e43bc98
adding url dependencies
iesahin Jun 14, 2022
ee7703c
added outputs section content
iesahin Jun 15, 2022
602e327
minor
iesahin Jun 24, 2022
1b08b42
added running pipelines content
iesahin Jun 24, 2022
1ff0479
moved outputs below running
iesahin Jun 24, 2022
27ef04f
removed plots section header
iesahin Jun 24, 2022
c2c0461
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Jul 13, 2022
7df801a
guide: Defining Data Pipelines
jorgeorpinel Jul 14, 2022
605a000
guide: split up Data Pipelines section
jorgeorpinel Jul 14, 2022
f22f2a0
Update content/docs/command-reference/plots/templates.md
jorgeorpinel Jul 19, 2022
928ad25
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Jul 20, 2022
73982c2
guide: Data Pipes -> ML Pipes
jorgeorpinel Jul 20, 2022
a97006e
guide: oops, remove op-pipes file
jorgeorpinel Jul 20, 2022
f9f0a59
guide: remoge ML Pipes intro
jorgeorpinel Jul 20, 2022
53b4321
guide: mention both imports in Def ML Pipes
jorgeorpinel Jul 20, 2022
b6d8a0c
guide: move DAG info from cmd ref
jorgeorpinel Jul 20, 2022
60af6a7
guide: move all info and links about DAG to ML Pipes
jorgeorpinel Jul 20, 2022
830fe2b
guide: point from some Stage links to ML Pipes
jorgeorpinel Jul 20, 2022
cb042af
guide: delete Running ML Pipes (for now)
jorgeorpinel Jul 20, 2022
b8844da
nav: remove future ML Pipes guides
jorgeorpinel Jul 20, 2022
2040246
guide: remove ML Pipes/ Experimental Pipes
jorgeorpinel Jul 20, 2022
192e189
roll back unrelated changes...
jorgeorpinel Jul 20, 2022
078b50d
guide: roll back dvc.yaml page changes
jorgeorpinel Jul 20, 2022
fed4832
guide: link ML Pipes/ Defining Stages to dvc.yaml/stages spec
jorgeorpinel Jul 20, 2022
49780b6
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Jul 21, 2022
3410b6a
guide: link deps and params tooltip to ML Pipes/ Stages guide sections
jorgeorpinel Jul 21, 2022
0ca32c2
guide: links from dvc.yaml doc to ML Pipes/ Stages
jorgeorpinel Jul 21, 2022
8a55302
guide: more links
jorgeorpinel Jul 21, 2022
7ebc3ad
guide: oops, remove unused files
jorgeorpinel Jul 21, 2022
7461b07
remove unrelated change
jorgeorpinel Jul 21, 2022
bea5368
guide: move stage definition details to ML Pipes
jorgeorpinel Jul 21, 2022
e44433c
guide: move stage command details into ML Pipes
jorgeorpinel Jul 21, 2022
e47d741
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Jul 28, 2022
a4824a1
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 1, 2022
1411ec7
ref: roll back unrelated changes
jorgeorpinel Aug 2, 2022
9800a35
.
jorgeorpinel Aug 2, 2022
43517c4
ref: few more links to dependency graph in guide
jorgeorpinel Aug 2, 2022
0b81e74
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 3, 2022
09f5aaf
ref: reorg exp init to include simple usage example in Desc
jorgeorpinel Aug 3, 2022
ded42ff
concept: reintroduce DAG in more places
jorgeorpinel Aug 4, 2022
8ebd1bb
guide: pipelines are not ML-specific
jorgeorpinel Aug 4, 2022
55afd17
guide: more details for params fields
jorgeorpinel Aug 4, 2022
076402f
one word
jorgeorpinel Aug 8, 2022
e065653
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 21, 2022
adc4382
guide: restructure Def Pipes and
jorgeorpinel Aug 22, 2022
3f8e29a
guide: rewrite Def Pipes intro
jorgeorpinel Aug 22, 2022
4f3391a
guide: move DAG up in Def Pipes
jorgeorpinel Aug 22, 2022
86e5b18
guide: inner link in Def Pipes
jorgeorpinel Aug 22, 2022
8e88b7f
guide: fix link and typos
jorgeorpinel Aug 22, 2022
9216380
start: revert DAG changes
jorgeorpinel Aug 23, 2022
51151c8
guide: use typical ML stage names
jorgeorpinel Aug 23, 2022
50b2513
guide: better flow in Pipes index
jorgeorpinel Aug 23, 2022
29790fd
glossary: high-level def of Pipes
jorgeorpinel Aug 23, 2022
dc3b6bc
guide: move Stage command to dvc.yaml ref
jorgeorpinel Aug 23, 2022
dadd099
guide: remove abc mention
jorgeorpinel Aug 23, 2022
0283efc
guide: edits to Defining Pipes
jorgeorpinel Aug 24, 2022
b55ac41
guide: improve Param deps in Def Pipes and
jorgeorpinel Aug 24, 2022
b947c67
guide: add Outputs to Def Pipes
jorgeorpinel Aug 24, 2022
bfe0ec2
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 24, 2022
8505bc7
guide: update dep, param and out tooltips
jorgeorpinel Aug 24, 2022
57ed5c0
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 25, 2022
627afb5
guide: separate params in Pipes vs Exps
jorgeorpinel Aug 25, 2022
502b3bd
ref: move Stage commands section of dvc.yaml up
jorgeorpinel Aug 25, 2022
533a1b0
guide: update Def Pipes and DAG
jorgeorpinel Aug 25, 2022
c5cbf58
params: more separation of content and
jorgeorpinel Aug 25, 2022
416d312
concept: rehash params
jorgeorpinel Aug 25, 2022
75340be
guide: more holistic pipelining info
jorgeorpinel Aug 26, 2022
dc16807
guide: Pipe edits
jorgeorpinel Aug 26, 2022
23cd9c6
params: roll back changes for now...
jorgeorpinel Aug 26, 2022
72bd5ee
Revert "params: roll back changes for now..."
jorgeorpinel Aug 26, 2022
3c8f635
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 29, 2022
ef24051
guide: more edits to Pipelines
jorgeorpinel Aug 31, 2022
a542867
Merge branch 'main' into iesahin/ug-pipelines
jorgeorpinel Aug 31, 2022
a316948
Merge branch 'main' into params/refactor
jorgeorpinel Aug 31, 2022
b5a2d9f
params: do not define as "simple values"
jorgeorpinel Aug 31, 2022
80cb220
Merge branch 'iesahin/ug-pipelines' into params/refactor
jorgeorpinel Sep 2, 2022
4a590f8
ref: better params index intro
jorgeorpinel Sep 2, 2022
11ed801
ref: mention param groups in dvc.yaml
jorgeorpinel Sep 2, 2022
4005a8a
params: DVC can pass them via templating/dict unpacking
jorgeorpinel Sep 2, 2022
6bf10e7
Merge branch 'main' into params/refactor
jorgeorpinel Sep 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ train_config.json train.weight_decay - 0.001

Note that `exp run --set-param` (`-S`) doesn't update your `dvc.yaml`. When
appending or removing <abbr>parameters</abbr>, make sure to update the
[`params` section](https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameter-dependencies)
[`params` section](https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameters)
of your `dvc.yaml` accordingly.

</admon>
14 changes: 8 additions & 6 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# params diff

Show changes in [parameters](/doc/command-reference/params) between commits in
the <abbr>DVC repository</abbr>, or between a commit and the
<abbr>workspace</abbr>.
Show changes in `dvc params` between commits in the <abbr>DVC repository</abbr>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous was better I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to roll back. The idea here is that the cmd ref should be self-contained so linking between references when possible makes sense. Alternatively we could use the tooltip instead of link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to commit:

Suggested change
Show changes in `dvc params` between commits in the <abbr>DVC repository</abbr>,
Show changes in <abbr>params</abbr> between commits in the <abbr>DVC repository</abbr>,

or between a commit and the <abbr>workspace</abbr>.

> Requires that Git is being used to version the project.

Expand All @@ -21,12 +20,15 @@ positional arguments:

## Description

Provides a quick way to compare parameter values among experiments in the
Provides a quick way to compare <abbr>parameters</abbr> among experiments in the
repository history. The differences shown by this command include the old and
new param values, along with the param name.

> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g.
> with the the `-p` (`--params`) option of `dvc stage add`).
<admon type="info">

Parameters are defined in the `params` field of `dvc.yaml`. See `dvc params`.

</admon>

Without arguments, `dvc params diff` compares parameters currently present in
the <abbr>workspace</abbr> (uncommitted changes) with the latest committed
Expand Down
115 changes: 63 additions & 52 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# params

Contains a command to show changes in parameters:
Contains a command to show changes in <abbr>parameters</abbr>:
[diff](/doc/command-reference/params/diff).

## Synopsis
Expand All @@ -16,62 +16,69 @@ positional arguments:

## Description

In order to track parameters and hyperparameters associated to machine learning
experiments in <abbr>DVC projects</abbr>, DVC provides a different type of
dependencies: _parameters_. They usually have simple names like `epochs`,
`learning-rate`, `batch_size`, etc.
Parameters can be any values used inside your code to influence the results
(e.g. machine learning [hyperparameters]). DVC can track these as key/value
pairs from structured YAML 1.2, JSON, TOML 1.0,
[or Python](#examples-python-parameters-file) files (`params.yaml` by default).
Params usually have simple names like `epochs`, `learning-rate`, `batch_size`,
etc. Example:

To start tracking parameters, list them under the `params` field of `dvc.yaml`
stages (manually or with the the `-p`/`--params` option of `dvc stage add`). For
example:
```yaml
epochs: 900
tuning:
- learning-rate: 0.945
- max_depth: 7
paths:
- labels: 'materials/labels'
- truth: 'materials/ground'
```

To start tracking parameters, list their names under the `params` field of
`dvc.yaml` (manually or with the the `-p`/`--params` option of `dvc stage add`).
For example:

```yaml
stages:
learn:
cmd: ./deep.py
cmd: python deep.py # reads params.yaml internally
params:
- epochs # track specific parameter (from params.yaml)
- tuning.learning-rate
- myparams.toml: # track specific params from custom file
- batch_size
- config.json: # track all parameters in this file
- epochs # specific param from params.yaml
- tuning.learning-rate # nested param from params.yaml
- paths # entire group from params.yaml
- myparams.toml:
- batch_size # param from custom file
- config.json: # all params in this file
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
```

In contrast to a regular <abbr>dependency</abbr>, a parameter dependency is not
a file or directory. Instead, it consists of a _parameter name_ (or key) in a
_parameters file_, where the _parameter value_ should be found. This allows you
to define [stage](/doc/command-reference/run) dependencies more granularly:
changes to other parts of the params file will not affect the stage. Parameter
dependencies also prevent situations where several stages share a regular
dependency (e.g. a config file), and any change in it invalidates all of them
(see `dvc status`), causing unnecessary re-executions upon `dvc repro`.

The default **parameters file** name is `params.yaml`, but any other YAML 1.2,
JSON, TOML 1.0, or [Python](#examples-python-parameters-file) files can be used
additionally (listed under `params:` as shown in the sample above). These files
are typically written manually (or they can be generated) and they can be
versioned directly with Git.

**Parameter values** should be organized in tree-like hierarchies (dictionaries)
inside params files (see [Examples](#examples)). DVC will interpret param names
as the tree path to find those values. Supported types are: string, integer,
float, boolean, and arrays (groups of params). Note that DVC does not ascribe
any specific meaning to these values.
<admon type="info">

DVC saves parameter names and values to `dvc.lock` in order to track them over
time. They will be compared to the latest params files to determine if the stage
is outdated upon `dvc repro` (or `dvc status`).
See [more details] about this syntax.

</admon>

> Note that DVC does not pass the parameter values to stage commands. The
> commands executed by DVC will have to load and parse the parameters file by
> itself.
Multiple stages of a <abbr>pipeline</abbr> can [use the same params file] as
<abbr>dependency</abbr>, but only certain values will affect each
<abbr>stage</abbr>.

Parameters can also be used for [templating] `dvc.yaml` itself (see also **Dict
Unpacking**), which means you can pass them to your [stage commands] as
command-line arguments. You can also load them in Python code with
`dvc.api.params_show()`.

The `dvc params diff` command is available to show parameter changes, displaying
their current and previous values.

💡 Parameters can also be used for
[templating](/doc/user-guide/project-structure/dvcyaml-files#templating)
`dvc.yaml` itself.
DVC saves parameter names and values to `dvc.lock` in order to track them over
time. They will be compared to the latest params files to determine if the stage
is outdated upon `dvc repro` (or `dvc status`).

[hyperparameters]:
/doc/user-guide/experiment-management/running-experiments#tuning-hyperparameters
[use the same params file]:
/doc/user-guide/data-pipelines/defining-pipelines#parameter-dependencies
[more details]: /doc/user-guide/project-structure/dvcyaml-files#parameters
[templating]: /doc/user-guide/project-structure/dvcyaml-files#templating
[stage commands]: /doc/user-guide/project-structure/dvcyaml-files#stage-commands

## Options

Expand All @@ -98,9 +105,9 @@ process:
bow: 15000
```

Using `dvc stage add`, define a [stage](/doc/command-reference/run) that depends
on params `lr`, `layers`, and `epochs` from the params file above. Full paths
should be used to specify `layers` and `epochs` from the `train` group:
Using `dvc stage add`, define a <abbr>stage</abbr> that depends on params `lr`,
`layers`, and `epochs` from the params file above. Full paths should be used to
specify `layers` and `epochs` from the `train` group:

```cli
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
Expand All @@ -112,7 +119,7 @@ $ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
> Python parameters files.

The `train.py` script will have some code to parse and load the needed
parameters. For example, you can use `dvc.api.params_show()`:
parameters. You can use `dvc.api.params_show()` for this:

```py
import dvc.api
Expand Down Expand Up @@ -197,9 +204,13 @@ previous version, which is why all `Old` values are `—`.

## Examples: Python parameters file

> ⚠️ Note that complex expressions (unsupported by
> [ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval))
> won't be parsed as DVC parameters.
<admon type="warn">

See Note that complex expressions (unsupported by
[ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval))
won't be parsed as DVC parameters.

</admon>

Consider this Python parameters file named `params.py`:

Expand Down Expand Up @@ -237,8 +248,8 @@ class TestConfig:
METRICS = ['metric']
```

The following [stage](/doc/command-reference/run) depends on params `BOOL`,
`INT`, as well as `TrainConfig`'s `EPOCHS` and `layers`:
The following <abbr>stage</abbr> depends on params `BOOL`, `INT`, as well as
`TrainConfig`'s `EPOCHS` and `layers`:

```cli
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
Expand Down
13 changes: 7 additions & 6 deletions content/docs/user-guide/basic-concepts/parameter.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
name: 'Parameter Dependency'
match: [parameter, parameters, param, params, hyperparameter, hyperparameters]
name: 'Parameters'
match: [parameter, parameters]
tooltip: >-
Pipeline stages (defined in `dvc.yaml`) can depend on specific values inside
an arbitrary YAML, JSON, TOML, or Python file (`params.yaml` by default).
Stages are invalid (considered outdated) when any of their parameter values
change. See [`dvc params`](/doc/command-reference/params).
Hyperparameters or other config values used by your code, loaded from a a
structured file (`params.yaml` by default). They can be tracked as granular
dependencies for stages of DVC pipelines (defined in `dvc.yaml`). DVC can also
compare them among machine learning experiments (useful for optimization). See
`dvc params`.
---
4 changes: 2 additions & 2 deletions content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ of the development of data features, hyperspace exploration, deep learning
optimization, etc.

Some of DVC's base features already help you codify and analyze experiments.
[Parameters](/doc/command-reference/params) are simple values in a formatted
text file which you can tweak and use in your code. On the other end,
[Parameters](/doc/command-reference/params) are values in a structured text
file, which you can tweak and use in your code. On the other end,
[metrics](/doc/command-reference/metrics) (and
[plots](/doc/command-reference/plots)) let you define, visualize, and compare
quantitative measures of your results.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ experiment(s). These files codify _pipelines_ that specify one or more

### Running the pipeline(s)

You can run the experiment pipeline using `dvc exp run`. It uses `./dvc.yaml`
(in the current directory) by default.
You can run the experiment <abbr>pipelines</abbr> using `dvc exp run`. It uses
`./dvc.yaml` (in the current directory) by default.

```dvc
$ dvc exp run
Expand All @@ -45,20 +45,20 @@ once.
> 📖 `dvc exp run` is an experiment-specific alternative to `dvc repro`.

[reproduction targets]: /doc/command-reference/repro#options
[dependency graph]:
/doc/user-guide/data-pipelines/defining-pipelines#directed-acyclic-graph
[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines

## Tuning (hyper)parameters

Parameters are the values that modify the behavior of coded processes -- in this
case producing different experiment results. Machine learning experimentation
often involves defining and searching hyperparameter spaces to improve the
resulting model metrics.
Parameters are any values used inside your code to tune modeling attributes, or
that affect experiment results in any other way. For example, a [random forest
classifier] may require a _maximum depth_ value. Machine learning
experimentation often involves defining and searching hyperparameter spaces to
improve the resulting model metrics.

In DVC project source code, <abbr>parameters</abbr> should be read from _params
files_ (`params.yaml` by default) and defined in `dvc.yaml`. When a tracked
param value has changed, `dvc exp run` invalidates any stages that depend on it,
and reproduces them.
Your source code should read params from structured [parameters files]
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
(`params.yaml` by default). Define them with the `params` field of `dvc.yaml`
for DVC to track them. When a param value has changed, `dvc exp run` invalidates
any stages that depend on it, and reproduces them.

> 📖 See `dvc params` for more details.

Expand All @@ -80,6 +80,11 @@ $ dvc exp run -S learning_rate=0.001 -S units=128 # set multiple params
...
```

[random forest classifier]:
https://medium.com/all-things-ai/in-depth-parameter-tuning-for-random-forest-d67bb7e920d
[parameters files]:
/doc/user-guide/project-structure/dvcyaml-files#parameters-files

## Experiment results

The results of the last `dvc exp run` can be seen in the <abbr>workspace</abbr>.
Expand Down
11 changes: 6 additions & 5 deletions content/docs/user-guide/pipelines/defining-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,10 @@ changed for the purpose of stage invalidation.
## Parameter dependencies

A more granular type of dependency is the parameter (`params` field of
`dvc.yaml`), or _hyperparameters_ in machine learning. These represent simple
values used inside your code to tune data processing, or that affect stage
execution in any other way. For example, training a [Neural Network] usually
requires _batch size_ and _epoch_ values.
`dvc.yaml`), or _hyperparameters_ in machine learning. These are any values used
inside your code to tune data processing, or that affect stage execution in any
other way. For example, training a [Neural Network] usually requires _batch
size_ and _epoch_ values.

Instead of hard-coding param values, your code can read them from a structured
file (e.g. YAML format). DVC can track any key/value pair in a supported
Expand Down Expand Up @@ -228,7 +228,8 @@ Use `dvc params diff` to compare parameters across project versions.
Stage outputs are files (or directories) written by <abbr>pipelines</abbr>, for
example machine learning models, intermediate artifacts, as well as data [plots]
and performance [metrics]. These files are <abbr>cached</abbr> by DVC
automatically, and tracked with the help of `dvc.lock` files.
automatically, and tracked with the help of `dvc.lock` files (or `.dvc` files,
see `dvc add`).

Outputs can be dependencies of subsequent stages (as explained earlier). So when
they change, DVC may need to reproduce downstream stages as well (handled
Expand Down
Loading