Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular updates & plots 1.0 update #1382

Merged
merged 64 commits into from
Jun 11, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
8030c46
cmd ref: add note that move creates dirs
jorgeorpinel May 31, 2020
5bffb49
cmd ref: improve structure of add ref desc.
jorgeorpinel May 31, 2020
431ffc0
grammar: add some commas
jorgeorpinel May 31, 2020
3f2d554
term: checksum -> hash value in dvcignore guide
jorgeorpinel May 31, 2020
86b8f62
style: lower case bullet text
jorgeorpinel Jun 1, 2020
b75f314
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 1, 2020
47ad311
cmd ref: remove some redundancy in metrics index
jorgeorpinel Jun 1, 2020
2f1e09c
cmd ref: update plots refs synopsis and descriptions
jorgeorpinel Jun 1, 2020
3a723a0
Add plots modify cmd
dmpetrov Jun 2, 2020
bc54b22
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 2, 2020
177de9b
Merge branch '2020-05-31' of github.com:iterative/dvc.org into 2020-0…
jorgeorpinel Jun 2, 2020
f59eea9
typo: CSV->csv
jorgeorpinel Jun 3, 2020
c73f7c0
term: working tree -> workspace
jorgeorpinel Jun 3, 2020
140608c
cmd ref: couple improvements to add ref
jorgeorpinel Jun 3, 2020
0028b53
Update config/prismjs/dvc-commands.js
jorgeorpinel Jun 3, 2020
8e572a5
cmd ref: update plots modify description
jorgeorpinel Jun 3, 2020
434fe96
cmd ref: add plots modify to nav, with a few more improvements
jorgeorpinel Jun 3, 2020
5108e14
cmd ref: plots --show-json -> --show-vega
jorgeorpinel Jun 3, 2020
fba18c3
rename x-lab to x-label
dmpetrov Jun 4, 2020
a75dda2
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 6, 2020
30493ab
cmd ref: review descriptions of plots index, show, and diff
jorgeorpinel Jun 6, 2020
d33aecc
cmd ref: review and update old plots cmds options
jorgeorpinel Jun 6, 2020
e385a02
cmd ref: a couple more option updates
jorgeorpinel Jun 6, 2020
22b648c
cmd ref: emphasize add works with any large file/dir
jorgeorpinel Jun 6, 2020
f3cba9c
cmd ref: updae plots modify top half (definition, description)
jorgeorpinel Jun 6, 2020
532184b
cmd ref: improve all plot cmd option descriptions
jorgeorpinel Jun 6, 2020
4d9cd71
Update content/docs/command-reference/plots/modify.md
jorgeorpinel Jun 7, 2020
791dbd2
cmd ref: review examples (mainly images) in plots modify
jorgeorpinel Jun 8, 2020
697ab77
Merge branch '2020-05-31' of github.com:iterative/dvc.org into 2020-0…
jorgeorpinel Jun 8, 2020
3eb3858
cmd ref: rephrase info about how data arrays are injected to plot tem…
jorgeorpinel Jun 8, 2020
90cac8e
cmd ref: update info on how targets for for plots show/diff
jorgeorpinel Jun 8, 2020
94027c5
cmd ref: double check all plots examples
jorgeorpinel Jun 8, 2020
19fbe99
cmd ref: remove info about plots show --select
jorgeorpinel Jun 8, 2020
c51e2ad
cmd ref: update add desc
jorgeorpinel Jun 8, 2020
13dce7c
cmd ref: re-explain dvc add for dirs
jorgeorpinel Jun 8, 2020
3a418f6
cmd ref: improve description about targets in plots diff
jorgeorpinel Jun 8, 2020
4729c0d
cmd ref: make emoji note in plots index
jorgeorpinel Jun 8, 2020
5a92508
cmd ref: remove ineffective CSV code block highlighting from plots refs
jorgeorpinel Jun 8, 2020
aa620cc
get started: improve intro in index
jorgeorpinel Jun 8, 2020
1357d22
glossary: remove external deps entry (no need)
jorgeorpinel Jun 8, 2020
74c5234
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 8, 2020
9e26518
cmd ref: add info about column indexing for headerless tables
jorgeorpinel Jun 9, 2020
0bfb2ad
cmd ref: update template metavar for plots subcommands
jorgeorpinel Jun 9, 2020
ae1d911
cmd ref: mention YAML is supported for plots
jorgeorpinel Jun 9, 2020
da639ef
cmd ref: rename template metavar again in plots
jorgeorpinel Jun 9, 2020
a467891
cmd ref: clarify plots modify --no-csv-header
jorgeorpinel Jun 9, 2020
f6a2e4f
cmd ref: add note about plots modify in show and diff
jorgeorpinel Jun 9, 2020
a613233
cmd ref: update all plots options again
jorgeorpinel Jun 9, 2020
ba94d6d
cmd ref: more updates to plots et al. per Ivan's review
jorgeorpinel Jun 9, 2020
0d5f502
cmd ref: multiple plots diff --targets allowed
jorgeorpinel Jun 9, 2020
439f8e0
cmd ref: update note about detault metrics in index
jorgeorpinel Jun 10, 2020
7bce905
cmd ref: emphasize add --recursive is rarely needed
jorgeorpinel Jun 10, 2020
edd0323
cmd ref: plots diff: update revisions arg desc
jorgeorpinel Jun 10, 2020
c0ce897
cmd ref: mention column names and numbers in plots {cmd} -x and -y
jorgeorpinel Jun 10, 2020
6bcf06d
cmd ref: emphasize that metrics diff is not a real diff
jorgeorpinel Jun 10, 2020
b0f49c3
cmd ref: simplify note on plots targets
jorgeorpinel Jun 10, 2020
471a565
cmd ref: how to id colmns in plots modify --no-csv-header
jorgeorpinel Jun 10, 2020
9169967
cmd ref: add default target behavior to plots show and diff
jorgeorpinel Jun 10, 2020
de9110a
cmd ref: rename plots option --no-header
jorgeorpinel Jun 10, 2020
1bcae8d
cmd ref: term: prop->property (plots)
jorgeorpinel Jun 10, 2020
2c50676
cmd ref: more details on metrics index
jorgeorpinel Jun 11, 2020
dc0725c
cmd ref: more details on plots index
jorgeorpinel Jun 11, 2020
1970fc3
cmd ref: note about disply props in plots modify
jorgeorpinel Jun 11, 2020
cd602c9
Merge branch 'master' into 2020-05-31
jorgeorpinel Jun 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,22 @@ into <abbr>data artifacts</abbr> of the <abbr>project</abbr>. By default, these
are committed to the <abbr>cache</abbr> (use the `--no-commit` option to avoid
this, and `dvc commit` to finish the process when needed).

Note that [external data](/doc/user-guide/managing-external-data) (targets
`dvc add` is typically used to version control raw data or initial datasets from
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
which data processing [pipelines](/doc/command-reference/pipeline) are built,
but it can be used to track any large file or directory. We recommend using
`dvc run` to version control intermediate and final results (like ML models).
This way you bring data provenance and make your project
[reproducible](/doc/command-reference/repro).

💡 Note that [external data](/doc/user-guide/managing-external-data) (targets
outside the <abbr>workspace</abbr>) is supported.

Under the hood, a few actions are taken for each file (or directory) in
`targets`:

1. Calculate the file hash.
2. Move the file contents to the cache directory (by default in `.dvc/cache`),
using the file hash to form the cached file names. (See
2. Move the file contents to the cache (by default in `.dvc/cache`), using the
file hash to form the cached file names. (See
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
for more details.)
3. Attempt to replace the file with a link to the cached data (more details
Expand All @@ -56,13 +63,15 @@ more details.
> treated as _changed_ by `dvc repro`, which always executes them. See `dvc run`
> to learn more about stage files.

By default DVC tries to use reflinks (see
By default, DVC tries to use reflinks (see
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to avoid copying any file contents and to optimize DVC-file operations for large
files. DVC also supports other link types for use on file systems without
`reflink` support, but they have to be specified manually. Refer to the
`cache.type` config option in `dvc config cache` for more information.

### Tracking directories
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

A `dvc add` target can be an individual file or a directory. There are two ways
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
to work with directory hierarchies with `dvc add`:

Expand All @@ -78,13 +87,6 @@ to work with directory hierarchies with `dvc add`:
(with `.dir` extension), that in turn points to the files added from the
hierarchy.

`dvc add` is typically used to version control raw data or initial datasets from
which data processing [pipelines](/doc/command-reference/pipeline) are built,
but it can be used to track any large file or directory. We recommend using
`dvc run` to version control intermediate and final results (like ML models).
This way you bring data provenance and make your project
[reproducible](/doc/command-reference/repro).

## Options

- `-R`, `--recursive` - determines the files to add by searching each target
Expand Down
11 changes: 2 additions & 9 deletions content/docs/command-reference/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,22 +71,15 @@ to compare and pick the best performing experiment.

### Default metric files

`dvc metrics` subcommands use all metric files that are specified in `dvc.yaml`
by default. There's no need to specify metric file names to see these metrics.
Metric files can be added to `dvc.yaml` with the `--metrics` (`-m`) or
`--metrics-no-cache` (`-M`) options of `dvc run`, or manually to the `metrics`
section of a stage in `dvc.yaml`:
`dvc metrics` subcommands by default use the metric files specified in
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
`dvc.yaml` (if any), for example `summary.json` below:

```yaml
stages:
train:
cmd: python train.py
deps:
- users.csv
params:
- epochs
- dropout
- lr
outs:
- model.pkl
metrics:
Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/move.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,17 @@ positional arguments:
`dvc move` is useful when a `src` file or directory has previously been added to
the <abbr>project</abbr> with `dvc add`, creating a
[DVC-file](/doc/user-guide/dvc-file-format) (with `src` as a dependency).
`dvc move` behaves like `mv src dst`, moving `src` to the given `dst` path, but
it also renames and updates the corresponding DVC-file appropriately.
`dvc move` behaves similar to `mv src dst`, moving `src` to the given `dst`
path, but it also renames and updates the corresponding DVC-file appropriately.

> Note that `src` may be a copy or a
> [link](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
> to a file in cache. The cached file is not changed by this command.

If the destination path (`dst`) already exists and is a directory, the source
code file or directory (`src`) is moved unchanged into this folder along with
the corresponding DVC-file.
the corresponding DVC-file. Otherwise, any directories in `dst` are created,
similar to `mkdir -P`.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

Let's imagine the following scenario:

Expand Down
56 changes: 26 additions & 30 deletions content/docs/command-reference/plots/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@ plotting them in a single image.
## Synopsis

```usage
usage: dvc plots diff [-h] [-q | -v] [-t [TEMPLATE]] [-d [DATAFILE]] [-f FILE]
[-s SELECT] [-x X] [-y Y] [--stdout] [--no-csv-header]
[--no-html] [--title TITLE] [--xlab XLAB] [--ylab YLAB]
[revisions [revisions ...]]
usage: dvc plots diff [-h] [-q | -v] [-t <path>]
[--targets [<path> [<path> ...]]] [-o <path>]
[-x <field>] [-y <field>] [--no-csv-header]
[--show-json] [--title <text>] [--xlab <text>]
[--ylab <text>] [revisions [revisions ...]]

positional arguments:
revisions Git commits to plot from
revisions Git commits to plot from/to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does /to part mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you can chose the from revision and also the to revision(s).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ... feels really weird

why just "Git commits to read metrics from to plot" is not enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for Git commits to find metrics to diff for now (in d22c9c6). Cc @efiop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: Actually I changed diff for compare in edd0323

```

## Description
Expand All @@ -21,8 +22,8 @@ This command visualize difference between metrics among experiments in the
repository history. Requires that Git is being used to version the metrics
files.

The metrics file needs to be specified through `-d`/`--datafile` option. Also, a
plot can be customized with
The metrics file needs to be specified through `--targets` option. Also, a plot
can be customized with
[plot templates](/doc/command-reference/plots#plot-templates) using the
`--template` option. To learn more about the file formats and templates please
see `dvc plots`.
Expand All @@ -39,43 +40,38 @@ resulting plot shows all of them in a single output.

This command can work with metric files that are committed to a repository
history, data files controlled by DVC, or any other file in the workspace. In
the case of DVC-tracked `datafile`, the `revisions` are used to find the
the case of DVC-tracked `targets`, the `revisions` are used to find the
corresponding [DVC-files](/doc/user-guide/dvc-file-format).

## Options

- `-d [DATAFILE], --datafile [DATAFILE]` - metrics file to visualize.
- `--targets [TARGETS]` (**required**) - metrics file to visualize.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `-t [TEMPLATE], --template [TEMPLATE]` -
- `-t <path>, --template <path>` -
[plot template](/doc/command-reference/plots#plot-templates) to be injected
with data. The default template is `.dvc/plots/default.json`. See more details
in `dvc plots`.

- `-f FILE, --file FILE` - name of the generated file. By default, the output
- `-o <path>, --out <path>` - name of the generated file. By default, the output
file name is equal to the input filename with additional `.html` suffix or
`.json` suffix for `--no-html` mode.
`.json` suffix for `--show-json` mode.

- `--no-html` - do not wrap output Vega specification (JSON) with HTML.
- `-x <field>` - field name for X axis. An auto-generated `index` field is used
by default.

- `-x X` - field name for X axis. An auto-generated `index` field is used by
default.
- `-y <field>` - field name for Y axis. The last column or field found in the
`targets` is used by default.

- `-y Y` - field name for Y axis. The last column or field found in the
`datafile` is used by default.
- `--xlab <text>` - X axis title. The X field name is the default title.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `-s SELECT, --select SELECT` - select which fields or JSONPath to store in the
metrics file [metadata](https://vega.github.io/vega/docs/data/). The
auto-generated, zero-based `index` column is always included.
- `--ylab <text>` - Y axis title. The Y field name is the default title.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `--xlab XLAB` - X axis title. The X field name is the default title.
- `--title <text>` - plot title.

- `--ylab YLAB` - Y axis title. The Y field name is the default title.
- `--show-json` - show output in JSON format.

- `--title TITLE` - plot title.

- `-o, --stdout` - print plot content to stdout.

- `--no-csv-header` - provided CSV or TSV datafile does not have a header.
- `--no-csv-header` - lets DVC know that CSV or TSV `targets` do not have a
header.

- `-h`, `--help` - prints the usage/help message, and exit.

Expand All @@ -90,7 +86,7 @@ To visualize the difference between uncommitted changes of a metrics file and
the last commit:

```dvc
$ dvc plots diff -d logs.csv
$ dvc plots diff --targets logs.csv
file:///Users/dmitry/src/plots/logs.html
```

Expand All @@ -100,7 +96,7 @@ The difference between two versions (commit hashes, tags, or branches can be
provided):

```dvc
$ dvc plots diff -d logs.csv HEAD 0135527
$ dvc plots diff --targets logs.csv HEAD 0135527
file:///Users/usr/src/plots/logs.csv.html
```

Expand Down Expand Up @@ -130,7 +126,7 @@ A predefined confusion matrix
separate plots:

```dvc
$ dvc plots diff -t confusion -x predicted -d classes.csv
$ dvc plots diff -t confusion -x predicted --targets classes.csv
file:///Users/usr/src/test/plot_old/classes.csv.html
```

Expand Down
10 changes: 4 additions & 6 deletions content/docs/command-reference/plots/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# plots

Contains commands to visualize _plot metrics_ in structured files (JSON, CSV, or
TSV): [show](/doc/command-reference/plots/show),
A set of commands to visualize and compare _plot metrics_ in structured files
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
(JSON, CSV, or TSV): [show](/doc/command-reference/plots/show),
dmpetrov marked this conversation as resolved.
Show resolved Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[diff](/doc/command-reference/plots/diff).

## Synopsis
Expand All @@ -11,10 +11,8 @@ usage: dvc plots [-h] [-q | -v] {show,diff} ...

positional arguments:
COMMAND
show Generate a plot image file from a metrics file.
diff Plot differences in metrics between commits in the
DVC repository, or between the last commit and the
workspace.
show Generate plot from a metrics file.
diff Plot differences in metrics between commits.
```

## Types of metrics
Expand Down
55 changes: 26 additions & 29 deletions content/docs/command-reference/plots/show.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,55 @@
# plots show

Generate a plot image from from a [plot metrics](/doc/command-reference/plots)
file.
Generate [plot](/doc/command-reference/plots) from a metrics file.

## Synopsis

```usage
usage: dvc plots show [-h] [-q | -v] [-t [TEMPLATE]] [-f FILE]
[-s SELECT] [-x X] [-y Y] [--stdout]
[--no-csv-header] [--no-html] [--title TITLE]
[--xlab XLAB] [--ylab YLAB] [datafile]
usage: dvc plots show [-h] [-q | -v] [-t <path>] [-o <path>]
[-x <field>] [-y <field>] [--no-csv-header]
[--show-json] [--title <text>] [--xlab <text>]
[--ylab <text>] targets [targets ...]

positional arguments:
datafile Metrics file to visualize
targets Metrics files to visualize.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

## Description

This command provides a quick way to visualize metrics such as loss functions,
AUC curves, confusion matrices, etc. Please see `dvc plots` for information on
the supported data formats and other relevant details about DVC plots.
AUC curves, confusion matrices, etc. One of more `targets` are required by this
command as argument.

Please see `dvc plots` for information on the supported data formats and other
relevant details about DVC plots.

## Options

- `-t [TEMPLATE], --template [TEMPLATE]` -
- `-t <path>, --template <path>` -
[plot template](/doc/command-reference/plots#plot-templates) to be injected
with data. The default template is `.dvc/plots/default.json`. See more details
in `dvc plots`.

- `-f FILE, --file FILE` - name of the generated file. By default, the output
- `-o <path>, --out <path>` - name of the generated file. By default, the output
file name is equal to the input filename with additional `.html` suffix or
`.json` suffix for `--no-html` mode.

- `--no-html` - do not wrap output Vega specification (JSON) with HTML.

- `-x X` - field name for X axis. An auto-generated `index` field is used by
default.
`.json` suffix for `--show-json` mode.

- `-y Y` - field name for Y axis. The last column or field found in the
`datafile` is used by default.
- `-x <field>` - field name for X axis. An auto-generated `index` field is used
by default.

- `-s SELECT, --select SELECT` - select which fields or JSONPath to store in the
metrics file [metadata](https://vega.github.io/vega/docs/data/). The
auto-generated, zero-based `index` column is always included.
- `-y <field>` - field name for Y axis. The last column or field found in the
`targets` is used by default.

- `--xlab XLAB` - X axis title. The X field name is the default title.
- `--xlab <text>` - X axis title. The X field name is the default title.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `--ylab YLAB` - Y axis title. The Y field name is the default title.
- `--ylab <text>` - Y axis title. The Y field name is the default title.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `--title TITLE` - plot title.
- `--title <text>` - plot title.

- `-o, --stdout` - print plot content to stdout.
- `--show-json` - show output in JSON format.

- `--no-csv-header` - provided CSV or TSV datafile does not have a header.
- `--no-csv-header` - lets DVC know that CSV or TSV `targets` do not have a
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
header.

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down Expand Up @@ -128,11 +125,11 @@ file:///Users/usr/src/plots/logs.csv.html
In many automation scenarios (like CI/CD for ML), it is convenient to have the
[Vega-Lite](https://vega.github.io/vega-lite/) specification instead of the
entire HTML plot file. For example to generating another image format like PNG
or JPEG, or to include differently into a web app. The `--no-html` option
or JPEG, or to include differently into a web app. The `--show-json` option
prevents wrapping the plot in HTML. Note that the resulting file is JSON:

```dvc
$ dvc plots show --select accuracy --no-html logs.csv
$ dvc plots show --select accuracy --show-json logs.csv
file:///Users/usr/src/plots/logs.csv.json
```

Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ The following are the types of remote storage (protocols) supported:
$ dvc remote add myremote s3://bucket/path
```

By default DVC expects your AWS CLI is already
By default, DVC expects your AWS CLI is already
[configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html).
DVC will be using default AWS credentials file to access S3. To override some of
these settings, use the parameters described in `dvc remote modify`.
Expand Down Expand Up @@ -237,7 +237,7 @@ modified.
$ dvc remote add myremote gs://bucket/path
```

By default DVC expects your GCP CLI is already
By default, DVC expects your GCP CLI is already
[configured](https://cloud.google.com/sdk/docs/authorizing). DVC will be using
default GCP key file to access Google Cloud Storage. To override some of these
settings, use the parameters described in `dvc remote modify`.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/remote/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ The following are the customizable types of remote storage (protocols):

### Click for Amazon S3

By default DVC expects your AWS CLI is already
By default, DVC expects your AWS CLI is already
[configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html).
DVC will be using default AWS credentials file to access S3. To override some of
these settings, you could use the following options:
Expand Down
Loading