Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular updates & plots 1.0 update #1382

Merged
merged 64 commits into from
Jun 11, 2020
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
8030c46
cmd ref: add note that move creates dirs
jorgeorpinel May 31, 2020
5bffb49
cmd ref: improve structure of add ref desc.
jorgeorpinel May 31, 2020
431ffc0
grammar: add some commas
jorgeorpinel May 31, 2020
3f2d554
term: checksum -> hash value in dvcignore guide
jorgeorpinel May 31, 2020
86b8f62
style: lower case bullet text
jorgeorpinel Jun 1, 2020
b75f314
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 1, 2020
47ad311
cmd ref: remove some redundancy in metrics index
jorgeorpinel Jun 1, 2020
2f1e09c
cmd ref: update plots refs synopsis and descriptions
jorgeorpinel Jun 1, 2020
3a723a0
Add plots modify cmd
dmpetrov Jun 2, 2020
bc54b22
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 2, 2020
177de9b
Merge branch '2020-05-31' of github.com:iterative/dvc.org into 2020-0…
jorgeorpinel Jun 2, 2020
f59eea9
typo: CSV->csv
jorgeorpinel Jun 3, 2020
c73f7c0
term: working tree -> workspace
jorgeorpinel Jun 3, 2020
140608c
cmd ref: couple improvements to add ref
jorgeorpinel Jun 3, 2020
0028b53
Update config/prismjs/dvc-commands.js
jorgeorpinel Jun 3, 2020
8e572a5
cmd ref: update plots modify description
jorgeorpinel Jun 3, 2020
434fe96
cmd ref: add plots modify to nav, with a few more improvements
jorgeorpinel Jun 3, 2020
5108e14
cmd ref: plots --show-json -> --show-vega
jorgeorpinel Jun 3, 2020
fba18c3
rename x-lab to x-label
dmpetrov Jun 4, 2020
a75dda2
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 6, 2020
30493ab
cmd ref: review descriptions of plots index, show, and diff
jorgeorpinel Jun 6, 2020
d33aecc
cmd ref: review and update old plots cmds options
jorgeorpinel Jun 6, 2020
e385a02
cmd ref: a couple more option updates
jorgeorpinel Jun 6, 2020
22b648c
cmd ref: emphasize add works with any large file/dir
jorgeorpinel Jun 6, 2020
f3cba9c
cmd ref: updae plots modify top half (definition, description)
jorgeorpinel Jun 6, 2020
532184b
cmd ref: improve all plot cmd option descriptions
jorgeorpinel Jun 6, 2020
4d9cd71
Update content/docs/command-reference/plots/modify.md
jorgeorpinel Jun 7, 2020
791dbd2
cmd ref: review examples (mainly images) in plots modify
jorgeorpinel Jun 8, 2020
697ab77
Merge branch '2020-05-31' of github.com:iterative/dvc.org into 2020-0…
jorgeorpinel Jun 8, 2020
3eb3858
cmd ref: rephrase info about how data arrays are injected to plot tem…
jorgeorpinel Jun 8, 2020
90cac8e
cmd ref: update info on how targets for for plots show/diff
jorgeorpinel Jun 8, 2020
94027c5
cmd ref: double check all plots examples
jorgeorpinel Jun 8, 2020
19fbe99
cmd ref: remove info about plots show --select
jorgeorpinel Jun 8, 2020
c51e2ad
cmd ref: update add desc
jorgeorpinel Jun 8, 2020
13dce7c
cmd ref: re-explain dvc add for dirs
jorgeorpinel Jun 8, 2020
3a418f6
cmd ref: improve description about targets in plots diff
jorgeorpinel Jun 8, 2020
4729c0d
cmd ref: make emoji note in plots index
jorgeorpinel Jun 8, 2020
5a92508
cmd ref: remove ineffective CSV code block highlighting from plots refs
jorgeorpinel Jun 8, 2020
aa620cc
get started: improve intro in index
jorgeorpinel Jun 8, 2020
1357d22
glossary: remove external deps entry (no need)
jorgeorpinel Jun 8, 2020
74c5234
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 8, 2020
9e26518
cmd ref: add info about column indexing for headerless tables
jorgeorpinel Jun 9, 2020
0bfb2ad
cmd ref: update template metavar for plots subcommands
jorgeorpinel Jun 9, 2020
ae1d911
cmd ref: mention YAML is supported for plots
jorgeorpinel Jun 9, 2020
da639ef
cmd ref: rename template metavar again in plots
jorgeorpinel Jun 9, 2020
a467891
cmd ref: clarify plots modify --no-csv-header
jorgeorpinel Jun 9, 2020
f6a2e4f
cmd ref: add note about plots modify in show and diff
jorgeorpinel Jun 9, 2020
a613233
cmd ref: update all plots options again
jorgeorpinel Jun 9, 2020
ba94d6d
cmd ref: more updates to plots et al. per Ivan's review
jorgeorpinel Jun 9, 2020
0d5f502
cmd ref: multiple plots diff --targets allowed
jorgeorpinel Jun 9, 2020
439f8e0
cmd ref: update note about detault metrics in index
jorgeorpinel Jun 10, 2020
7bce905
cmd ref: emphasize add --recursive is rarely needed
jorgeorpinel Jun 10, 2020
edd0323
cmd ref: plots diff: update revisions arg desc
jorgeorpinel Jun 10, 2020
c0ce897
cmd ref: mention column names and numbers in plots {cmd} -x and -y
jorgeorpinel Jun 10, 2020
6bcf06d
cmd ref: emphasize that metrics diff is not a real diff
jorgeorpinel Jun 10, 2020
b0f49c3
cmd ref: simplify note on plots targets
jorgeorpinel Jun 10, 2020
471a565
cmd ref: how to id colmns in plots modify --no-csv-header
jorgeorpinel Jun 10, 2020
9169967
cmd ref: add default target behavior to plots show and diff
jorgeorpinel Jun 10, 2020
de9110a
cmd ref: rename plots option --no-header
jorgeorpinel Jun 10, 2020
1bcae8d
cmd ref: term: prop->property (plots)
jorgeorpinel Jun 10, 2020
2c50676
cmd ref: more details on metrics index
jorgeorpinel Jun 11, 2020
dc0725c
cmd ref: more details on plots index
jorgeorpinel Jun 11, 2020
1970fc3
cmd ref: note about disply props in plots modify
jorgeorpinel Jun 11, 2020
cd602c9
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config/prismjs/dvc-commands.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ module.exports = [
'pull',
'pkg',
'plots show',
'plots modify',
'plots diff',
'plots',
'pipeline show',
Expand Down
60 changes: 29 additions & 31 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,27 @@ positional arguments:
## Description

The `dvc add` command is analogous to `git add`, in that it makes DVC aware of
the target data, as a first step to version it. It creates a
the target data, in order to start versioning it. It creates a
[`.dvc` file](/doc/user-guide/dvc-file-format) to track the added data.

The `targets` are files or directories to add with this command, that are turned
into <abbr>data artifacts</abbr> of the <abbr>project</abbr>. By default, these
are committed to the <abbr>cache</abbr> (use the `--no-commit` option to avoid
this, and `dvc commit` to finish the process when needed).
This command can be used to
[version control](/doc/use-cases/versioning-data-and-model-files) large files,
models, dataset directories, etc. that are too big for Git.

Note that [external data](/doc/user-guide/managing-external-data) (targets
outside the <abbr>workspace</abbr>) is supported.
The `targets` are the files or directories to add, which are turned into
<abbr>data artifacts</abbr> of the <abbr>project</abbr>. These are stored in the
<abbr>cache</abbr> by default (use the `--no-commit` option to avoid this, and
`dvc commit` to finish the process when needed).

> See also `dvc run` for more advanced ways to version intermediate and final
> results (like ML models).

Under the hood, a few actions are taken for each file (or directory) in
`targets`:

1. Calculate the file hash.
2. Move the file contents to the cache directory (by default in `.dvc/cache`),
using the file hash to form the cached file names. (See
2. Move the file contents to the cache (by default in `.dvc/cache`), using the
file hash to form the cached file names. (See
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
for more details.)
3. Attempt to replace the file with a link to the cached data (more details
Expand All @@ -59,34 +63,28 @@ files that can be tracked with Git. See
To avoid adding files inside a directory accidentally, you can add the
corresponding [patterns](/doc/user-guide/dvcignore) in a `.dvcignore` file.

By default DVC tries to use reflinks (see
By default, DVC tries to use reflinks (see
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to avoid copying any file contents and to optimize `.dvc` file operations for
large files. DVC also supports other link types for use on file systems without
`reflink` support, but they have to be specified manually. Refer to the
`cache.type` config option in `dvc config cache` for more information.

A `dvc add` target can be an individual file or a directory. There are two ways
to work with directory hierarchies with `dvc add`:

1. With `dvc add --recursive`, the hierarchy is traversed and every file is
added individually as described above. This means every file has its own
`.dvc` file, and a corresponding cached file is created (unless the
`--no-commit` option is used).
2. When not using `--recursive` a `.dvc` file is created for the top of the
directory (with default name `dirname.dvc`). Every file in the hierarchy is
added to the cache (unless the `--no-commit` option is used), but DVC does
not produce individual `.dvc` files for each file in the directory tree.
Instead, the single `.dvc` file references a special JSON file in the cache
(with `.dir` extension), that in turn points to the files added from the
hierarchy.

`dvc add` is typically used to version control raw data or initial datasets from
which data processing [pipelines](/doc/command-reference/pipeline) are built,
but it can be used to track any large file or directory. We recommend using
`dvc run` to version control intermediate and final results (like ML models).
This way you bring data provenance and make your project
[reproducible](/doc/command-reference/repro).
### Tracking directories
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

A `dvc add` target can be an individual file or a directory. In the latter case,
a DVC-file is created for the top of the directory (with default name
`<dir_name>.dvc`).

Every file in the hierarchy is added to the cache (unless the `--no-commit`
option is used), but DVC does not produce individual DVC-files for each file in
the directory tree. Instead, the single DVC-file references a special JSON file
in the cache (with `.dir` extension), that in turn points to the added files.

As an alternative, using the `--recursive` option every file is added
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
individually. This means that each file will have a corresponding DVC-file in
the same hierarchy, so it may not be desirable for directories with a large
number of files.

## Options

Expand Down
11 changes: 2 additions & 9 deletions content/docs/command-reference/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,22 +71,15 @@ to compare and pick the best performing experiment.

### Default metric files

`dvc metrics` subcommands use all metric files that are specified in `dvc.yaml`
by default. There's no need to specify metric file names to see these metrics.
Metric files can be added to `dvc.yaml` with the `--metrics` (`-m`) or
`--metrics-no-cache` (`-M`) options of `dvc run`, or manually to the `metrics`
section of a stage in `dvc.yaml`:
`dvc metrics` subcommands by default use the metric files specified in
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
`dvc.yaml` (if any), for example `summary.json` below:

```yaml
stages:
train:
cmd: python train.py
deps:
- users.csv
params:
- epochs
- dropout
- lr
outs:
- model.pkl
metrics:
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/metrics/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ history use `--all-commits` option:

```dvc
$ dvc metrics show --all-commits
working tree:
workspace:
eval.json:
AUC: 0.66729
error: 0.16982
Expand All @@ -100,7 +100,7 @@ Metrics from different branches can be shown by `--all-branches` (`-a`) option:

```dvc
$ dvc metrics show -a
working tree:
workspace:
eval.json:
AUC: 0.66729
error: 0.16982
Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/move.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ positional arguments:
`dvc move` is useful when a `src` file or directory has previously been added to
the <abbr>project</abbr> with `dvc add`, creating a
[`.dvc` file](/doc/user-guide/dvc-file-format) (with `src` as a dependency).
`dvc move` behaves like `mv src dst`, moving `src` to the given `dst` path, but
it also renames and updates the corresponding `.dvc` file appropriately.
`dvc move` behaves similar to `mv src dst`, moving `src` to the given `dst`
path, but it also renames and updates the corresponding `.dvc` file
appropriately.

> Note that `src` may be a copy or a
> [link](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
Expand Down
106 changes: 55 additions & 51 deletions content/docs/command-reference/plots/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,76 +6,78 @@ plotting them in a single image.
## Synopsis

```usage
usage: dvc plots diff [-h] [-q | -v] [-t [TEMPLATE]] [-d [DATAFILE]] [-f FILE]
[-s SELECT] [-x X] [-y Y] [--stdout] [--no-csv-header]
[--no-html] [--title TITLE] [--xlab XLAB] [--ylab YLAB]
[revisions [revisions ...]]
usage: dvc plots diff [-h] [-q | -v] [--targets [<path> [<path> ...]]]
[-t <name_or_path>] [-x <field>] [-y <field>]
[--no-csv-header] [--title <text>]
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[--x-label <text>] [--y-label <text>] [-o <path>]
[--show-vega]
[revisions [revisions ...]]

positional arguments:
revisions Git commits to plot from
revisions Git commits to plot from/to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does /to part mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you can chose the from revision and also the to revision(s).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ... feels really weird

why just "Git commits to read metrics from to plot" is not enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for Git commits to find metrics to diff for now (in d22c9c6). Cc @efiop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: Actually I changed diff for compare in edd0323

```

## Description

This command visualize difference between metrics among experiments in the
repository history. Requires that Git is being used to version the metrics
files.
This command is a way to visualize the difference between metrics among
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a sentence or two that it's not a "difference" - it's just multiple graphs on the same plot?

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. If it's not a difference though, what's the point of having 2 commands for showing plots? This one could be renamed show (supporting multiple revisions) and current show deprecated, no? WDYT @efiop cc @dmpetrov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that diff is not a real diff in 6bcf06d for now.

p.s. this is another note I remember having written before... May be lost in some other PR and will have to be careful to merge them correctly... or maybe a bad merge somewhere un-did some work here 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. found a parallel ongoing discussion related to this in iterative/dvc#3963 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted as an extra bullet in #1414

experiments in the <abbr>repository</abbr> history.

The metrics file needs to be specified through `-d`/`--datafile` option. Also, a
plot can be customized with
[plot templates](/doc/command-reference/plots#plot-templates) using the
`--template` option. To learn more about the file formats and templates please
see `dvc plots`.
Target metric files can be specified with the `--targets` (`-t`) option. These
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
should be <abbd>outputs</abbr> of one of the project stages (see the `--plots`
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
option of `dvc run`), listed in a
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories) file.

`revisions` are Git commit hashes, tag, or branch names. If none are specified,
`dvc plots diff` compares metrics currently present in the
<abbr>workspace</abbr> (uncommitted changes) with the latest committed version.
A single specified revision results in plotting the difference in metrics
between the workspace and that version.
`dvc plots diff` compares targets currently present in the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
<abbr>workspace</abbr> (uncommitted changes) with their latest committed
versions (required). A single specified revision results in plotting the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
difference between the workspace and that version.

In contrast to commands such as `git diff`, `dvc metrics diff` and
`dvc params diff`, **any number of `revisions` can be provided**, and the
resulting plot shows all of them in a single output.
Note that any number of `revisions` can be provided, and the resulting plot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omit "Note that" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's not a Note at the moment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a block quote, correct. But it's a note as in: it's a small detail we want to make sure the reader notes.

We have another 49 instances of "Note that" in docs that are not in a block quote 😬

shows all of them in a single output.

This command can work with metric files that are committed to a repository
history, data files controlled by DVC, or any other file in the workspace. In
the case of DVC-tracked `datafile`, the `revisions` are used to find the
corresponding [DVC-files](/doc/user-guide/dvc-file-format).
The plot style can be customized with
[plot templates](/doc/command-reference/plots#plot-templates), using the
`--template` option. To learn more about metric file formats and templates
please see `dvc plots`.

> Note that the default behavior of this command can be modified per metrics
> file with `dvc plots modify`.

## Options

- `-d [DATAFILE], --datafile [DATAFILE]` - metrics file to visualize.
- `--targets <path>` - metric files to visualize. Shows all plots by default.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `-o <path>, --out <path>` - name of the generated file. By default, the output
file name is equal to the input filename with a `.html` file extension (or
`.json` when using `--show-vega`).

- `-t [TEMPLATE], --template [TEMPLATE]` -
- `-t <name_or_path>, --template <name_or_path>` -
[plot template](/doc/command-reference/plots#plot-templates) to be injected
with data. The default template is `.dvc/plots/default.json`. See more details
in `dvc plots`.

- `-f FILE, --file FILE` - name of the generated file. By default, the output
file name is equal to the input filename with additional `.html` suffix or
`.json` suffix for `--no-html` mode.

- `--no-html` - do not wrap output Vega specification (JSON) with HTML.

- `-x X` - field name for X axis. An auto-generated `index` field is used by
default.
- `-x <field>` - field name from which the X axis data comes from. An
auto-generated `index` field is used by default. See
[Custom templates](/doc/command-reference/plots#custom-templates) for more
information on this `index` field.

- `-y Y` - field name for Y axis. The last column or field found in the
`datafile` is used by default.
- `-y <field>` - field name from which the Y axis data comes from. The last
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
column or field found in the `--targets` is used by default.

- `-s SELECT, --select SELECT` - select which fields or JSONPath to store in the
metrics file [metadata](https://vega.github.io/vega/docs/data/). The
auto-generated, zero-based `index` column is always included.
- `--x-label <text>` - X axis label. The X field name is the default.

- `--xlab XLAB` - X axis title. The X field name is the default title.
- `--y-label <text>` - Y axis label. The Y field name is the default.

- `--ylab YLAB` - Y axis title. The Y field name is the default title.
- `--title <text>` - plot title.

- `--title TITLE` - plot title.
- `--show-vega` - produce a
[Vega specification](https://vega.github.io/vega/docs/specification/) file
instead of HTML. See `dvc plots` for more info.

- `-o, --stdout` - print plot content to stdout.

- `--no-csv-header` - provided CSV or TSV datafile does not have a header.
- `--no-csv-header` - lets DVC know that CSV or TSV `--targets` do not have a
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
header. A 0-based numeric index can be used to identify each column instead of
names.

- `-h`, `--help` - prints the usage/help message, and exit.

Expand All @@ -90,17 +92,19 @@ To visualize the difference between uncommitted changes of a metrics file and
the last commit:

```dvc
$ dvc plots diff -d logs.csv
$ dvc plots diff --targets logs.csv --x-label x
file:///Users/dmitry/src/plots/logs.html
```

![](/img/plots_auc.svg)

> Note that we renamed the X axis label with option `--x-label x`.

The difference between two versions (commit hashes, tags, or branches can be
provided):

```dvc
$ dvc plots diff -d logs.csv HEAD 0135527
$ dvc plots diff --targets logs.csv HEAD 0135527
file:///Users/usr/src/plots/logs.csv.html
```

Expand All @@ -110,7 +114,7 @@ file:///Users/usr/src/plots/logs.csv.html

We'll use tabular metrics file `classes.csv` for this example:

```csv
```
predicted,actual
cat,cat
cat,cat
Expand All @@ -124,13 +128,13 @@ cat,turtle
...
```

A predefined confusion matrix
The predefined confusion matrix
[template](/doc/command-reference/plots#plot-templates) (in
`.dvc/plots/confusion.json`) shows how metric differences can be faceted by
separate plots:
separate plots. It can be enabled with `-t` (`--template`):

```dvc
$ dvc plots diff -t confusion -x predicted -d classes.csv
$ dvc plots diff -t confusion --targets classes.csv -x predicted
file:///Users/usr/src/test/plot_old/classes.csv.html
```

Expand Down
Loading