Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. updates (2.0ish) #2062

Merged
merged 41 commits into from
Jan 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
1cf6519
Merge branch 'master' into jorge
jorgeorpinel Dec 21, 2020
8dd77a9
config: standardize sample variable/option values
jorgeorpinel Dec 22, 2020
df113ec
Merge branch 'master' into jorge
jorgeorpinel Dec 22, 2020
4dd5322
cmd: misc updates to repro, gc, run
jorgeorpinel Dec 22, 2020
fa30e85
Merge branch 'jorge' of github.com:iterative/dvc.org into jorge
jorgeorpinel Dec 22, 2020
27b3007
Merge branch 'master' into jorge
jorgeorpinel Dec 23, 2020
2e370de
Merge branch 'jorge' of github.com:iterative/dvc.org into jorge
jorgeorpinel Dec 23, 2020
ec5cb83
Merge branch 'master' into jorge
jorgeorpinel Dec 27, 2020
e7297b5
blog: manually format a md string in frontmatter
jorgeorpinel Dec 27, 2020
523af6b
guide: multi cmd in main dvc.yaml example
jorgeorpinel Dec 27, 2020
442d325
guide: copy edit in dvc.yaml
jorgeorpinel Dec 27, 2020
3e5435a
cmd: remove term "self-incrementing"
jorgeorpinel Dec 27, 2020
95965ea
guide: re-instate deleted changes
jorgeorpinel Dec 27, 2020
7a59869
cmd: review params refs
jorgeorpinel Dec 27, 2020
a7f35ef
blog: fix broken frontmatter
jorgeorpinel Dec 27, 2020
194e1e4
cmd: update params def. explanation in diff like it is in index
jorgeorpinel Dec 27, 2020
4e1cb47
cmd: absorb a params diff note into a p
jorgeorpinel Dec 27, 2020
8ede12f
cmd: more improvements and reord to params docs
jorgeorpinel Dec 28, 2020
0930fa4
guide: expand on dvc.yaml params field (multiple params files)
jorgeorpinel Dec 28, 2020
bf0f04c
guide: remove emoji we never use from recs
jorgeorpinel Dec 28, 2020
59fe782
Merge branch 'master' into jorge
jorgeorpinel Dec 29, 2020
082d52d
blog: remove prettier-ignore
jorgeorpinel Dec 29, 2020
7165723
cmd: differentiate between params and param deps in params refs
jorgeorpinel Dec 29, 2020
917b0ab
cmd: start with example in params index
jorgeorpinel Dec 29, 2020
7dc0596
cmd: move params value info to its section
jorgeorpinel Dec 29, 2020
3cdd408
cmd: review --targets arg descs.
jorgeorpinel Dec 29, 2020
64bf6b7
cmd: params diff --targets don't expand anything
jorgeorpinel Dec 29, 2020
dc38b81
Merge branch 'master' into jorge
jorgeorpinel Jan 2, 2021
919f6fd
cmd: remove 3rd mention of repro in params Desc
jorgeorpinel Jan 2, 2021
0579261
cmd: update params diff --targets
jorgeorpinel Jan 2, 2021
5d38ea0
cmd: update metrics diff --targets
jorgeorpinel Jan 2, 2021
edf7565
cmd: std. --targets option accross refs
jorgeorpinel Jan 2, 2021
a163a35
cmd: simplify params Desc and fix Examples
jorgeorpinel Jan 2, 2021
25834cc
cmd: make params intro sample realistic
jorgeorpinel Jan 2, 2021
64b31de
cmd: clarify default behavior of params diff
jorgeorpinel Jan 2, 2021
cbd6d62
cmd: clarify about params/metrics/plots diff --tagets
jorgeorpinel Jan 2, 2021
53ed2d2
cmd: typo
jorgeorpinel Jan 2, 2021
d8221e0
Merge branch 'master' into jorge
jorgeorpinel Jan 3, 2021
19c846a
cmd: note that metrics/params diff work in any Git repo
jorgeorpinel Jan 3, 2021
ff5665d
cmd: clarify more about default params used by diff
jorgeorpinel Jan 3, 2021
3e7d40f
cmd: final details on params index
jorgeorpinel Jan 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions content/blog/2020-04-06-april-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ descriptionLong: |
projects by our users and big ideas about best practices in ML and data
science.
picture: 2020-04-06/april_header.png
pictureComment:
A view from [Barrancas del
Cobre](https://en.wikipedia.org/wiki/Copper_Canyon), shot by Jorge Orpinel
Pérez. Jorge has mastered the art of working on DVC remotely.
pictureComment: |
A view from
[Barrancas del Cobre](https://en.wikipedia.org/wiki/Copper_Canyon), shot by
Jorge Orpinel Pérez. Jorge has mastered the art of working on DVC remotely.
author: elle_obrien
commentsUrl: https://discuss.dvc.org/t/april-20-heartbeat/347
tags:
Expand Down
13 changes: 4 additions & 9 deletions content/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,18 +44,13 @@ for example when `dvc init` was used with the `--no-scm` option.

## Options

- `--targets <paths>` - limit command scope to these paths. When specifying
arguments for `--targets` before `a_rev`/`b_rev`, you should use `--` after
this option's arguments, e.g.:
- `--targets <paths>` - specific DVC-tracked files to compare.

```dvc
$ dvc diff --targets t1.json t2.yaml -- HEAD v1
```

Alternatively, you can also run the above statement as:
When specifying arguments for `--targets` before `a_rev`/`b_rev`, you should
use `--` after this option's arguments (POSIX terminals), e.g.:

```dvc
$ dvc diff HEAD v1 --targets t1.json t2.json
$ dvc diff --targets t1.json t2.yaml -- HEAD v1
```
shcheklein marked this conversation as resolved.
Show resolved Hide resolved

- `--show-json` - prints the command's output in easily parsable JSON format,
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ target file or directory (found at `path` in `url`) to the current working
directory. (Analogous to `wget`, but for repos.)

> Note that unlike `dvc import`, this command does not track the downloaded
> files (does not create a `.dvc` file). For that reason, this command doesn't
> require an existing DVC project to run in.
> files (does not create a `.dvc` file). For that reason, it doesn't require an
> existing DVC project to run in.

> See `dvc list` for a way to browse repository contents to find files or
> directories to download.
Expand Down
25 changes: 13 additions & 12 deletions content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,29 +31,30 @@ specified, `dvc metrics diff` compares metrics currently present in the
(required). A single specified revision results in comparing the workspace and
that version.

> Note that unlike `dvc diff`, this command doesn't always need DVC files to
> find metrics files (see `--targets` option). For that reason, it doesn't
> require an existing DVC project to run in. It can work in any Git repo.

Another way to display metrics is the `dvc metrics show` command, which just
lists all the current metrics, without comparisons.

## Options

- `--targets <paths>` - limit command scope to these metrics files. Using `-R`,
directories to search metrics files in can also be given. When specifying
arguments for `--targets` before `revisions`, you should use `--` after this
option's arguments, e.g.:

```dvc
$ dvc metrics diff --targets t1.json t2.yaml -- HEAD v1
```
- `--targets <paths>` - specific metrics files to compare. It accepts `paths` to
any valid metrics file, regardless of whether `dvc.yaml` is currently tracking
any metrics in them. Using `-R`, directories to search metrics files in can
also be given.

Alternatively, you can also run the above statement as:
When specifying arguments for `--targets` before `revisions`, you should use
`--` after this option's arguments (POSIX terminals), e.g.:

```dvc
$ dvc metrics diff HEAD v1 --targets t1.json t2.json
$ dvc metrics diff --targets t1.json t2.yaml -- HEAD v1
```

- `-R`, `--recursive` - determines the metrics files to use by searching each
target directory and its subdirectories for DVC-`dvc.yaml` files to inspect.
If there are no directories among the `targets`, this option is ignored.
target directory and its subdirectories for DVC-tracked files to inspect. If
there are no directories among the `targets`, this option is ignored.

- `--all` - list all metrics, even those without changes.

Expand Down
41 changes: 20 additions & 21 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# params diff

Show changes in [parameter dependencies](/doc/command-reference/params) between
commits in the <abbr>DVC repository</abbr>, or between a commit and the
Show changes in [parameters](/doc/command-reference/params) between commits in
the <abbr>DVC repository</abbr>, or between a commit and the
<abbr>workspace</abbr>.

## Synopsis

```usage
usage: dvc params diff [-h] [-q | -v] [--targets [<path> [<path> ...]]]
usage: dvc params diff [-h] [-q | -v] [--targets [<paths> [<paths> ...]]]
[--all] [--show-json] [--show-md] [--no-path]
[a_rev] [b_rev]

Expand All @@ -19,35 +19,34 @@ positional arguments:

## Description

This command provides a quick way to compare parameter values among experiments
in the repository history. Requires that Git is being used to version the
project params.
Provides a quick way to compare parameter values among experiments in the
repository history. Requires that Git is being used to version the project
params.

> Parameter dependencies are defined with the `-p` option in `dvc run`. See also
> `dvc params`.
> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g.
> with the the `-p` (`--params`) option of `dvc run`).

Without arguments, this command compares parameters currently present in the
<abbr>workspace</abbr> (uncommitted changes) with the latest committed version.
This includes everything in `params.yaml` (default parameters file) as well all
the `params` used in `dvc.yaml`. Values in `dvc.lock` are used for comparison.
Only params that have changes are listed.

Supported parameter _value_ types are: string, integer, float, and arrays. DVC
itself does not ascribe any specific meaning for these values.

❗ By default it only shows parameters that were changed.
> Note that unlike `dvc diff`, this command doesn't always need DVC files to
> find params files (see `--targets` option). For that reason, it doesn't
> require an existing DVC project to run in. It can work in any Git repo.

## Options

- `--targets <paths>` - limit command scope to these params files. When
specifying arguments for `--targets` before `revisions`, you should use `--`
after this option's arguments, e.g.:

```dvc
$ dvc params diff --targets m1.json m2.yaml -- HEAD v1
```
- `--targets <paths>` - specific params files to compare. It accepts `paths` to
any valid parameters file, regardless of whether `dvc.yaml` is currently
tracking any params in them.

Alternatively, you can also run the above statement as:
When specifying arguments for `--targets` before `a_rev`/`b_rev`, you should
use `--` after this option's arguments (POSIX terminals), e.g.:

```dvc
$ dvc params diff HEAD v1 --targets m1.json m2.json
$ dvc params diff --targets m1.json m2.yaml -- HEAD v1
```

- `--all` - prints all parameters including not changed.
Expand Down
107 changes: 64 additions & 43 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,44 +18,55 @@ positional arguments:

In order to track parameters and hyperparameters associated to machine learning
experiments in <abbr>DVC projects</abbr>, DVC provides a different type of
dependencies: _parameters_. Parameters are defined using the the `-p`
(`--params`) option of `dvc run`, using simple names like `epochs`,
dependencies: _parameters_. They usually have simple names like `epochs`,
`learning-rate`, `batch_size`, etc.

In contrast to a regular <abbr>dependency</abbr>, a parameter is not a file (or
directory). Instead, it consists of a _parameter name_ (or key) to find inside a
YAML 1.2, JSON, TOML, or [Python](#examples-python-parameters-file) _parameters
file_. Multiple parameter dependencies can be specified from one or more
parameters files.

The default parameters file name is `params.yaml`. Parameters should be
organized as a tree hierarchy inside, as DVC will locate param names by their
tree path. Parameters files have to be manually written, or generated, and these
can be versioned directly with Git.
To start tracking parameters, list them under the `params` field of `dvc.yaml`
stages (manually or with the the `-p`/`--params` option of `dvc run`). For
example:

Supported parameter _value_ types are: string, integer, float, and arrays. DVC
itself does not ascribe any specific meaning for these values. They are
user-defined, and serve as a way to generalize and parametrize an machine
learning algorithms or data processing code.
```yaml
stages:
learn:
cmd: ./deep.py
params:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use epochs, learning-rate, batch_siz above, let's do the same in the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is way better 😅. Done!

- epochs
- tuning.learning-rate
- myparams.toml:
- batch_size
```

DVC saves the param names and their latest values in the `dvc.yaml` file. These
values will be compared to the ones in the params files to determine if the
stage is invalidated upon pipeline [reproduction](/doc/command-reference/repro).
In contrast to a regular <abbr>dependency</abbr>, a parameter dependency is not
a file or directory. Instead, it consists of a _parameter name_ (or key) in a
_parameters file_, where the _parameter value_ should be found. This allows you
to define [stage](/doc/command-reference/run) dependencies more granularly:
changes to other parts of the params file will not affect the stage. Parameter
dependencies also prevent situations where several stages share a regular
dependency (e.g. a config file), and any change in it invalidates all these
stages, causing unnecessary re-executions upon `dvc repro`.

The default **parameters file** name is `params.yaml`, but any other YAML 1.2,
JSON, TOML, or [Python](#examples-python-parameters-file) files can be used
additionally (listed under `params:` with a sub-list of param values, as shown
in the sample above) . These files are typically written manually (or they can
be generated) and they can be versioned directly with Git.

**Parameter values** should be organized in tree-like hierarchies (dictionaries)
inside param files (see [Examples](#examples)). DVC will interpret param names
as the tree path to find those values. Supported types are: string, integer,
float, and arrays (groups of params). Note that DVC does not ascribe any
specific meaning to these values.

DVC saves parameter names and values to `dvc.lock` in order to track them over
time. They will be compared to the latest params files to determine if the stage
is outdated upon `dvc repro` (or `dvc status`).

> Note that DVC does not pass the parameter values to stage commands. The
> associated command executed by `dvc run` or `dvc repro` will have to open and
> parse the parameters file by itself, and use the params specified with `-p`.
> commands executed by DVC will have to load and parse the parameters file by
> itself.

The parameters concept helps to define [stage](/doc/command-reference/run)
dependencies more granularly. A particular parameter or set of parameters will
be required for the stage invalidation (see `dvc status` and `dvc repro`).
Changes to other parts of the dependency file will not affect the stage. This
prevents situations where several stages share a (configuration) file as a
common dependency, and any change in this dependency invalidates all these
stages and causes their reproduction unnecessarily.

`dvc params diff` is available to show changes in parameters, displaying the
param names as well as their current and previous values.
The `dvc params diff` command is available to show parameter changes, displaying
their current and previous values.

## Options

Expand All @@ -82,9 +93,9 @@ process:
bow: 15000
```

Define a [stage](/doc/command-reference/run) that depends on params `lr`,
`layers`, and `epochs` from the params file above. Full paths should be used to
specify `layers` and `epochs` from the `train` group:
Using `dvc run`, define a [stage](/doc/command-reference/run) that depends on
params `lr`, `layers`, and `epochs` from the params file above. Full paths
should be used to specify `layers` and `epochs` from the `train` group:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
Expand All @@ -95,8 +106,8 @@ $ dvc run -n train -d users.csv -o model.pkl \
> Note that we could use the same parameter addressing with JSON, TOML, or
> Python parameters files.

The `train.py` script will have some code to parse the needed parameters. For
example:
The `train.py` script will have some code to parse and load the needed
parameters. For example:

```py
import yaml
Expand All @@ -109,34 +120,44 @@ epochs = params['train']['epochs']
layers = params['train']['layers']
```

You can find that each parameter and it's value were saved to `dvc.yaml`. These
values will be compared to the ones in the parameters files whenever `dvc repro`
is used, to determine if dependency to the params file is invalidated:
You can find that each parameter was defined in `dvc.yaml`, as well as saved to
`dvc.lock` along with the values. These are compared to the params files when
`dvc repro` is used, to determine if the parameter dependency has changed.

```yaml
# dvc.yaml
stages:
train:
cmd: python train.py
deps:
- users.csv
params:
- lr
- train
- train.epochs
- train.layers
outs:
- model.pkl
```

Alternatively, the entire group of parameters `train` can be referenced, instead
of specifying each of the group parameters separately:
of specifying each of the params separately:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
-p lr,train \
python train.py
```

```yaml
# in dvc.yaml
params:
- lr
- train
```

In the examples above, the default parameters file name `params.yaml` was used.
This file name can be redefined with a prefix in the `-p` argument:
Note that this file name can be redefined using a prefix in the `-p` argument of
`dvc run`. In our case:

```dvc
$ dvc run -n train -d logs/ -o users.csv \
Expand Down Expand Up @@ -187,7 +208,7 @@ $ dvc run -n train -d users.csv -o model.pkl \
python train.py
```

Resulting `dvc.yaml` and `dvc.lock` files (notice the `params` list):
Resulting `dvc.yaml` and `dvc.lock` files (notice the `params` lists):

```yaml
stages:
Expand Down
29 changes: 13 additions & 16 deletions content/docs/command-reference/plots/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ overlaying them in a single image. This allows to compare them easily.
## Synopsis

```usage
usage: dvc plots diff [-h] [-q | -v] [--targets [<path> [<path> ...]]]
usage: dvc plots diff [-h] [-q | -v]
[--targets [<paths> [<paths> ...]]]
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[-t <name_or_path>] [-x <field>] [-y <field>]
[--no-header] [--title <text>]
[--x-label <text>] [--y-label <text>] [-o <path>]
Expand All @@ -24,7 +25,7 @@ This command is a way to visualize the "difference" between
versions of the <abbr>repository</abbr>, by overlaying them in a single plot.

> Note that unlike `dvc metrics diff`, this command does not calculate numeric
> differences between metrics file values.
> differences between plots file values.

`revisions` are Git commit hashes, tag, or branch names. If none are specified,
`dvc plots diff` compares plots currently present in the <abbr>workspace</abbr>
Expand All @@ -34,14 +35,13 @@ revision results in comparing the workspace and that version.
💡 Note that any number of `revisions` can be provided (the resulting plot shows
all of them in a single image).

All plots defined in `dvc.yaml` are used by default, but specific plots files
can be specified with the `--targets` option (note that targets don't
necessarily have to be defined in `dvc.yaml`).
All plots defined in `dvc.yaml` are used by default, but specific files can be
specified with the `--targets` option (any valid plots file is accepted).

The plot style can be customized with
[plot templates](/doc/command-reference/plots#plot-templates), using the
`--template` option. To learn more about metrics file formats and templates
please see `dvc plots`.
`--template` option. To learn more about plots files and templates please see
`dvc plots`.

> Note that the default behavior of this command can be modified per metrics
> file with `dvc plots modify`.
Expand All @@ -51,18 +51,15 @@ all the current plots, without comparisons.

## Options

- `--targets <path>` - specific metrics files to visualize. When specifying
arguments for `--targets` before `revisions`, you should use `--` after this
option's arguments, e.g.:
- `--targets <paths>` - specific plots files to visualize. It accepts `paths` to
any valid plots file, regardless of whether `dvc.yaml` is currently tracking
any plots in them.

```dvc
$ dvc plots diff --targets t1.json t2.csv -- HEAD v1 v2
```

Alternatively, you can also run the above statement as:
When specifying arguments for `--targets` before `revisions`, you should use
`--` after this option's arguments, e.g.:

```dvc
$ dvc plots diff HEAD v1 v2 --targets t1.json t2.csv
$ dvc plots diff --targets t1.json t2.csv -- HEAD v1 v2
```

- `-o <path>, --out <path>` - name of the generated file. By default, the output
Expand Down
Loading