Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: exp-related updates in other to cmds, etc. #2242

Merged
merged 16 commits into from
Mar 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 27 additions & 25 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# dag

Visualize the pipeline(s) in `dvc.yaml` as one or more graph(s) of connected
[stages](/doc/command-reference/run).
Visualize the <abbr>pipeline</abbr>(s) in `dvc.yaml` as one or more graph(s) of
connected [stages](/doc/command-reference/run).

## Synopsis

Expand All @@ -15,6 +15,11 @@ positional arguments:

## Description

Displays the stages of a pipeline up to the `target` stage. If omitted, it will
show the full project DAG.

### Directed acyclic graph

A data pipeline, in general, is a series of data processing
[stages](/doc/command-reference/run) (for example, console commands that take an
input and produce an outcome). The connections between stages are formed by the
Expand All @@ -33,29 +38,7 @@ restore one or more pipelines later (see `dvc repro`).
> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.

The `dvc dag` command displays the stages of a pipeline up to the target stage.
If `target` is omitted, it will show the full project DAG.

## Options

- `--full` - show full DAG that the `target` stage belongs to, instead of
showing only its ancestors.

- `--dot` - show DAG in
[DOT](<https://en.wikipedia.org/wiki/DOT_(graph_description_language)>)
format. It can be passed to third party visualization utilities.

- `-o`, `--outs` - show a DAG of chained dependencies and outputs instead of the
stages themselves. The graph may be significantly different.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

## Paginating the output
### Paginating the output

This command's output is automatically piped to
[less](<https://en.wikipedia.org/wiki/Less_(Unix)>) if available in the terminal
Expand Down Expand Up @@ -84,6 +67,25 @@ example in Bash, we could add the following line to `~/.bashrc`:
export DVC_PAGER=more
```

## Options

- `--full` - show full DAG that the `target` stage belongs to, instead of
showing only its ancestors.

- `--dot` - show DAG in
[DOT](<https://en.wikipedia.org/wiki/DOT_(graph_description_language)>)
format. It can be passed to third party visualization utilities.

- `-o`, `--outs` - show a DAG of chained dependencies and outputs instead of the
stages themselves. The graph may be significantly different.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

## Example: Visualize a DVC Pipeline

Visualize the prepare, featurize, train, and evaluate stages of a pipeline as
Expand Down
21 changes: 9 additions & 12 deletions content/docs/command-reference/gc.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,24 @@ usage: dvc gc [-h] [-q | -v] [-w] [-a] [-T] [--all-commits]

## Description

This command deletes (garbage collects) data files or directories that exist in
DVC cache but are no longer needed. With `--cloud` it also removes data in
This command can delete (garbage collect) data files or directories that exist
in the cache but are no longer needed. With `--cloud`, it also removes data in
[remote storage](/doc/command-reference/remote).

To avoid accidentally deleting data, it raises an error and doesn't touch any
files if no scope options are provided. It means it's user's responsibility to
explicitly provide the right set of options to specify what data is still needed
(so that DVC can figure out what files can be safely deleted).
To avoid accidentally deleting data, `dvc gc` doesn't do anything unless one or
a combination of scope options are provided (`--workspace`, `--all-branches`,
`--all-tags`, `--all-commits`). Use these to indicate which cached files are
still needed. See the [Options](#options) section for more details.

One of the scope options (`--workspace`, `--all-branches`, `--all-tags`,
`--all-commits`, `--all-experiments`) or a combination of them must be provided.
Each of them corresponds to keeping the data for the current workspace, and for
a certain set of commits (determined by reading the <abbr>DVC files</abbr> in
them). See the [Options](#options) section for more details.
The data kept is determined by reading the <abbr>DVC files</abbr> in the set of
commits of the given scope.

> Note that `dvc gc` tries to fetch any missing
> [`.dir` files](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory)
> from [remote storage](/doc/command-reference/remote) to the local
> <abbr>cache</abbr>, in order to determine which files should exist inside
> cached directories. These files may be missing if the cache directory was
> previously garbage collected, in a newly cloned copy of the repo, etc.
> previously garbage collected, or in a newly cloned copy of the repo, etc.

Unless the `--cloud` option is used, `dvc gc` does not remove data files from
any remote. This means that any files collected from the local cache can be
Expand Down
44 changes: 25 additions & 19 deletions content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Compare [metrics](/doc/command-reference/metrics) between two commits in the
<abbr>DVC repository</abbr>, or between a commit and the <abbr>workspace</abbr>.

> Requires that Git is being used to version the project.

## Synopsis

```usage
Expand All @@ -20,26 +22,29 @@ positional arguments:

## Description

This command provides a quick way to compare metrics among experiments in the
repository history. The differences shown by this command include the new value,
and numeric difference (delta) from the previous value of metrics (rounded to 5
Provides a quick way to compare metrics among experiments in the repository
history. The differences shown by this command include the new value, and
numeric difference (delta) from the previous value of metrics (rounded to 5
digits precision).

`a_rev` and `b_rev` are Git commit hashes, tag, or branch names. If none are
specified, `dvc metrics diff` compares metrics currently present in the
Without arguments, `dvc metrics diff` compares metrics currently present in the
<abbr>workspace</abbr> (uncommitted changes) with the latest committed versions
(required). A single specified revision results in comparing the workspace and
that version.
(required). Only metrics that changed are listed, by default (show everything
with `--all`).

All metrics defined in `dvc.yaml` are used by default, but specific metrics
files can be specified with the `--targets` option
`a_rev` and `b_rev` are optional Git commit hashes, tags, or branch names to
compare. A single specified revision results in comparing it against the
workspace.

> Note that targets don't necessarily have to be defined in `dvc.yaml`. For that
> reason, this command doesn't require an existing DVC project to run in; It
> works in any Git repo.

Another way to display metrics is the `dvc metrics show` command, which just
lists all the current metrics, without comparisons.
All metrics defined in `dvc.yaml` are used by default, but specific metrics
files can be specified with the `--targets` option.

Another way to display metrics is the `dvc metrics show` command, which lists
all the current metrics (without comparisons).

## Options

Expand All @@ -61,20 +66,21 @@ lists all the current metrics, without comparisons.

- `--all` - list all metrics, including those without changes.

- `--show-json` - prints the command's output in easily parsable JSON format,
- `--show-json` - prints the command's output in JSON format (machine-readable)
instead of a human-readable table.

- `--show-md` - prints the command's output in Markdown table format.
- `--show-md` - prints the command's output in the Markdown table format
([GFM](https://github.github.com/gfm/#tables-extension-)).

- `--old` - show old metric value in addition to the new value.
- `--old` - include the "Old" value column in addition to the new "Value" (and
"Change"). Useful when the values are not numeric, for example

- `--no-path` - don't show metric path in the result table. This option is
useful when only one metrics file is in use or there is no intersection
between the metric names.
- `--no-path` - hide the "Path" column that lists the param/metrics file
location. Useful when only one metrics file exists, for example

- `--precision <n>` -
[round](https://docs.python.org/3/library/functions.html#round) metrics to `n`
digits precision after the decimal point. Rounds to 5 digits by default.
[round](https://docs.python.org/3/library/functions.html#round) decimal values
to `n` digits of precision (5 by default).

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down
29 changes: 19 additions & 10 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Show changes in [parameters](/doc/command-reference/params) between commits in
the <abbr>DVC repository</abbr>, or between a commit and the
<abbr>workspace</abbr>.

> Requires that Git is being used to version the project.

## Synopsis

```usage
Expand All @@ -20,21 +22,28 @@ positional arguments:
## Description

Provides a quick way to compare parameter values among experiments in the
repository history. Requires that Git is being used to version the project
params.
repository history. The differences shown by this command include the old and
new param values, along with the param name.

> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g.
> with the the `-p` (`--params`) option of `dvc run`).

Without arguments, this command compares parameters currently present in the
<abbr>workspace</abbr> (uncommitted changes) with the latest committed version.
This includes everything in `params.yaml` (default parameters file) as well all
the `params` used in `dvc.yaml`. Values in `dvc.lock` are used for comparison.
Only params that have changes are listed.
Without arguments, `dvc params diff` compares parameters currently present in
the <abbr>workspace</abbr> (uncommitted changes) with the latest committed
versions (required). This includes everything in `params.yaml` (default
parameters file) as well all the `params` used in `dvc.yaml`. Values in
`dvc.lock` are used for comparison. Only params that have changes are listed.

`a_rev` and `b_rev` are optional Git commit hashes, tags, or branch names to
compare. A single specified revision results in comparing it against the
workspace.

All params defined in `dvc.yaml` are used by default, but specific ones can be
specified with the `--targets` option.

> Note that unlike `dvc diff`, this command doesn't always need DVC files to
> find params files (see `--targets` option). For that reason, it doesn't
> require an existing DVC project to run in. It can work in any Git repo.
> Note that targets don't necessarily have to be defined in `dvc.yaml`. For that
> reason, it doesn't require an existing DVC project to run in. It can work in
> any Git repo.

## Options

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/plots/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ versions of the <abbr>repository</abbr>, by overlaying them in a single plot.
> Note that unlike `dvc metrics diff`, this command does not calculate numeric
> differences between plots file values.

`revisions` are Git commit hashes, tag, or branch names. If none are specified,
`revisions` are Git commit hashes, tags, or branch names. If none are specified,
`dvc plots diff` compares plots currently present in the <abbr>workspace</abbr>
(uncommitted changes) with their latest commit (required). A single specified
revision results in comparing the workspace and that version.
Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ used to see what files `dvc pull` would download.
- `-a`, `--all-branches` - determines the files to download by examining
`dvc.yaml` and `.dvc` files in all Git branches instead of just those present
in the current workspace. It's useful if branches are used to track
experiments or project checkpoints. Note that this can be combined with `-T`
below, for example using the `-aT` flag.
experiments. Note that this can be combined with `-T` below, for example using
the `-aT` flag.

- `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as
the workspace. Useful if tags are used to mark certain versions of an
Expand Down Expand Up @@ -112,8 +112,8 @@ used to see what files `dvc pull` would download.

- `--run-cache` - downloads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from
the remote repository (to the cache only, like `dvc fetch --run-cache`). Note
that `dvc repro <stage_name>` is necessary to checkout these files (into the
the `dvc remote` (to the cache only, like `dvc fetch --run-cache`). Note that
`dvc repro <stage_name>` is necessary to checkout these files (into the
workspace) and update `dvc.lock`.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ in the cache (compared to the default remote.) It can be used to see what files
- `-a`, `--all-branches` - determines the files to upload by examining
`dvc.yaml` and `.dvc` files in all Git branches instead of just those present
in the current workspace. It's useful if branches are used to track
experiments or project checkpoints. Note that this can be combined with `-T`
below, for example using the `-aT` flag.
experiments. Note that this can be combined with `-T` below, for example using
the `-aT` flag.

- `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as
the workspace. Useful if tags are used to mark certain versions of an
Expand All @@ -90,7 +90,7 @@ in the cache (compared to the default remote.) It can be used to see what files

- `--run-cache` - uploads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to
the remote repository.
the `dvc remote`.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to upload data to
remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the
Expand Down
9 changes: 5 additions & 4 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,10 +179,11 @@ up-to-date and only execute the final stage.
corresponding pipelines, including the target stages themselves. This option
has no effect if `targets` are not provided.

- `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs
involved in the stages being reproduced, if they are found in the
[default remote storage](/doc/command-reference/remote/default). Note that it
tries the local run-cache first (unless `--no-run-cache` is also used).
- `--pull` - attempts to download outputs of stages found in the
[run-cache](/doc/user-guide/project-structure/internal-files#run-cache) during
reproduction. Uses the
[default remote storage](/doc/command-reference/remote/default). See also
`dvc pull`

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down