From af9b2a83ccc4f2ac6c2623efc6854f6ee1315f36 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 24 Feb 2021 22:54:48 -0600 Subject: [PATCH 1/7] ref: updates for consistency vs. exp refs --- .../docs/command-reference/metrics/diff.md | 44 +++++++++++-------- content/docs/command-reference/params/diff.md | 29 +++++++----- content/docs/command-reference/plots/diff.md | 2 +- content/docs/command-reference/pull.md | 4 +- content/docs/command-reference/push.md | 4 +- 5 files changed, 49 insertions(+), 34 deletions(-) diff --git a/content/docs/command-reference/metrics/diff.md b/content/docs/command-reference/metrics/diff.md index 4c89cc9b25..c5c58af6b4 100644 --- a/content/docs/command-reference/metrics/diff.md +++ b/content/docs/command-reference/metrics/diff.md @@ -3,6 +3,8 @@ Compare [metrics](/doc/command-reference/metrics) between two commits in the DVC repository, or between a commit and the workspace. +> Requires that Git is being used to version the project. + ## Synopsis ```usage @@ -20,26 +22,29 @@ positional arguments: ## Description -This command provides a quick way to compare metrics among experiments in the -repository history. The differences shown by this command include the new value, -and numeric difference (delta) from the previous value of metrics (rounded to 5 +Provides a quick way to compare metrics among experiments in the repository +history. The differences shown by this command include the new value, and +numeric difference (delta) from the previous value of metrics (rounded to 5 digits precision). -`a_rev` and `b_rev` are Git commit hashes, tag, or branch names. If none are -specified, `dvc metrics diff` compares metrics currently present in the +Without arguments, `dvc metrics diff` compares metrics currently present in the workspace (uncommitted changes) with the latest committed versions -(required). A single specified revision results in comparing the workspace and -that version. +(required). Only metrics that changed are listed, by default (show everything +with `--all`). -All metrics defined in `dvc.yaml` are used by default, but specific metrics -files can be specified with the `--targets` option +`a_rev` and `b_rev` are optional Git commit hashes, tags, or branch names to +compare. A single specified revision results in comparing it against the +workspace. > Note that targets don't necessarily have to be defined in `dvc.yaml`. For that > reason, this command doesn't require an existing DVC project to run in; It > works in any Git repo. -Another way to display metrics is the `dvc metrics show` command, which just -lists all the current metrics, without comparisons. +All metrics defined in `dvc.yaml` are used by default, but specific metrics +files can be specified with the `--targets` option. + +Another way to display metrics is the `dvc metrics show` command, which lists +all the current metrics (without comparisons). ## Options @@ -61,20 +66,21 @@ lists all the current metrics, without comparisons. - `--all` - list all metrics, including those without changes. -- `--show-json` - prints the command's output in easily parsable JSON format, +- `--show-json` - prints the command's output in JSON format (machine-readable) instead of a human-readable table. -- `--show-md` - prints the command's output in Markdown table format. +- `--show-md` - prints the command's output in the Markdown table format + ([GFM](https://github.github.com/gfm/#tables-extension-)). -- `--old` - show old metric value in addition to the new value. +- `--old` - include the "Old" value column in addition to the new "Value" (and + "Change"). Useful when the values are not numeric, for example -- `--no-path` - don't show metric path in the result table. This option is - useful when only one metrics file is in use or there is no intersection - between the metric names. +- `--no-path` - hide the "Path" column that lists the param/metrics file + location. Useful when only one metrics file exists, for example - `--precision ` - - [round](https://docs.python.org/3/library/functions.html#round) metrics to `n` - digits precision after the decimal point. Rounds to 5 digits by default. + [round](https://docs.python.org/3/library/functions.html#round) decimal values + to `n` digits of precision (5 by default). - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/content/docs/command-reference/params/diff.md b/content/docs/command-reference/params/diff.md index c8244ccc9c..ee3cbdceda 100644 --- a/content/docs/command-reference/params/diff.md +++ b/content/docs/command-reference/params/diff.md @@ -4,6 +4,8 @@ Show changes in [parameters](/doc/command-reference/params) between commits in the DVC repository, or between a commit and the workspace. +> Requires that Git is being used to version the project. + ## Synopsis ```usage @@ -20,21 +22,28 @@ positional arguments: ## Description Provides a quick way to compare parameter values among experiments in the -repository history. Requires that Git is being used to version the project -params. +repository history. The differences shown by this command include the old and +new param values, along with the param name. > Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g. > with the the `-p` (`--params`) option of `dvc run`). -Without arguments, this command compares parameters currently present in the -workspace (uncommitted changes) with the latest committed version. -This includes everything in `params.yaml` (default parameters file) as well all -the `params` used in `dvc.yaml`. Values in `dvc.lock` are used for comparison. -Only params that have changes are listed. +Without arguments, `dvc params diff` compares parameters currently present in +the workspace (uncommitted changes) with the latest committed +versions (required). This includes everything in `params.yaml` (default +parameters file) as well all the `params` used in `dvc.yaml`. Values in +`dvc.lock` are used for comparison. Only params that have changes are listed. + +`a_rev` and `b_rev` are optional Git commit hashes, tags, or branch names to +compare. A single specified revision results in comparing it against the +workspace. + +All params defined in `dvc.yaml` are used by default, but specific ones can be +specified with the `--targets` option. -> Note that unlike `dvc diff`, this command doesn't always need DVC files to -> find params files (see `--targets` option). For that reason, it doesn't -> require an existing DVC project to run in. It can work in any Git repo. +> Note that targets don't necessarily have to be defined in `dvc.yaml`. For that +> reason, it doesn't require an existing DVC project to run in. It can work in +> any Git repo. ## Options diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md index cfff56bb34..aa8b5d1f54 100644 --- a/content/docs/command-reference/plots/diff.md +++ b/content/docs/command-reference/plots/diff.md @@ -27,7 +27,7 @@ versions of the repository, by overlaying them in a single plot. > Note that unlike `dvc metrics diff`, this command does not calculate numeric > differences between plots file values. -`revisions` are Git commit hashes, tag, or branch names. If none are specified, +`revisions` are Git commit hashes, tags, or branch names. If none are specified, `dvc plots diff` compares plots currently present in the workspace (uncommitted changes) with their latest commit (required). A single specified revision results in comparing the workspace and that version. diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index 4442c41081..660227b8e0 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -79,8 +79,8 @@ used to see what files `dvc pull` would download. - `-a`, `--all-branches` - determines the files to download by examining `dvc.yaml` and `.dvc` files in all Git branches instead of just those present in the current workspace. It's useful if branches are used to track - experiments or project checkpoints. Note that this can be combined with `-T` - below, for example using the `-aT` flag. + experiments. Note that this can be combined with `-T` below, for example using + the `-aT` flag. - `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as the workspace. Useful if tags are used to track "checkpoints" of an experiment diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index bddd54427d..5ced597f5e 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -62,8 +62,8 @@ in the cache (compared to the default remote.) It can be used to see what files - `-a`, `--all-branches` - determines the files to upload by examining `dvc.yaml` and `.dvc` files in all Git branches instead of just those present in the current workspace. It's useful if branches are used to track - experiments or project checkpoints. Note that this can be combined with `-T` - below, for example using the `-aT` flag. + experiments. Note that this can be combined with `-T` below, for example using + the `-aT` flag. - `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as the workspace. Useful if tags are used to track "checkpoints" of an experiment From 03a7affac04fb929885e9d6ad620bb6d92c05b84 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 27 Feb 2021 06:41:31 -0600 Subject: [PATCH 2/7] ref: update dag Desc (similar to exp show) --- content/docs/command-reference/dag.md | 52 ++++++++++++++------------- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index e520db3464..6de313ffb3 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -1,7 +1,7 @@ # dag -Visualize the pipeline(s) in `dvc.yaml` as one or more graph(s) of connected -[stages](/doc/command-reference/run). +Visualize the pipeline(s) in `dvc.yaml` as one or more graph(s) of +connected [stages](/doc/command-reference/run). ## Synopsis @@ -15,6 +15,11 @@ positional arguments: ## Description +Displays the stages of a pipeline up to the `target` stage. If omitted, it will +show the full project DAG. + +### Directed acyclic graph + A data pipeline, in general, is a series of data processing [stages](/doc/command-reference/run) (for example, console commands that take an input and produce an output). A pipeline may produce intermediate @@ -32,29 +37,7 @@ restore one or more pipelines later (see `dvc repro`). > DVC builds a dependency graph > ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this. -`dvc dag` command displays the stages of a pipeline up to the target stage. If -`target` is omitted, it will show the full project DAG. - -## Options - -- `--full` - show full DAG that the `target` stage belongs to, instead of - showing only its ancestors. - -- `--dot` - show DAG in - [DOT]() - format. It can be passed to third party visualization utilities. - -- `-o`, `--outs` - show a DAG of chained dependencies and outputs instead of the - stages themselves. The graph may be significantly different. - -- `-h`, `--help` - prints the usage/help message, and exit. - -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no - problems arise, otherwise 1. - -- `-v`, `--verbose` - displays detailed tracing information. - -## Paginating the output +### Paginating the output This command's output is automatically piped to [less]() if available in the terminal @@ -83,6 +66,25 @@ example in Bash, we could add the following line to `~/.bashrc`: export DVC_PAGER=more ``` +## Options + +- `--full` - show full DAG that the `target` stage belongs to, instead of + showing only its ancestors. + +- `--dot` - show DAG in + [DOT]() + format. It can be passed to third party visualization utilities. + +- `-o`, `--outs` - show a DAG of chained dependencies and outputs instead of the + stages themselves. The graph may be significantly different. + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. + ## Example: Visualize a DVC Pipeline Visualize the prepare, featurize, train, and evaluate stages of a pipeline as From 76da523c5e6cc08c64a10362c8707e5133aae22c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 28 Feb 2021 05:01:20 -0600 Subject: [PATCH 3/7] ref: copy edits on gc --- content/docs/command-reference/gc.md | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/content/docs/command-reference/gc.md b/content/docs/command-reference/gc.md index 470316b961..40bbb28739 100644 --- a/content/docs/command-reference/gc.md +++ b/content/docs/command-reference/gc.md @@ -13,27 +13,24 @@ usage: dvc gc [-h] [-q | -v] [-w] [-a] [-T] [--all-commits] ## Description -This command deletes (garbage collects) data files or directories that exist in -DVC cache but are no longer needed. With `--cloud` it also removes data in +This command can delete (garbage collect) data files or directories that exist +in the cache but are no longer needed. With `--cloud`, it also removes data in [remote storage](/doc/command-reference/remote). -To avoid accidentally deleting data, it raises an error and doesn't touch any -files if no scope options are provided. It means it's user's responsibility to -explicitly provide the right set of options to specify what data is still needed -(so that DVC can figure out what files can be safely deleted). +To avoid accidentally deleting data, `dvc gc` doesn't do anything unless one or +a combination of scope options are provided (`--workspace`, `--all-branches`, +`--all-tags`, `--all-commits`). Use these to indicate which cached files are +still needed. See the [Options](#options) section for more details. -One of the scope options (`--workspace`, `--all-branches`, `--all-tags`, -`--all-commits`, `--all-experiments`) or a combination of them must be provided. -Each of them corresponds to keeping the data for the current workspace, and for -a certain set of commits (determined by reading the DVC files in -them). See the [Options](#options) section for more details. +The data kept is determined by reading the DVC files in the set of +commits of the given scope. > Note that `dvc gc` tries to fetch any missing > [`.dir` files](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) > from [remote storage](/doc/command-reference/remote) to the local > cache, in order to determine which files should exist inside > cached directories. These files may be missing if the cache directory was -> previously garbage collected, in a newly cloned copy of the repo, etc. +> previously garbage collected, or in a newly cloned copy of the repo, etc. Unless the `--cloud` option is used, `dvc gc` does not remove data files from any remote. This means that any files collected from the local cache can be From 68f61d9864f487359fe400a9220ec56b978ba4d3 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 28 Feb 2021 14:23:18 -0600 Subject: [PATCH 4/7] ref: edits to push/pull matching exp push/pull --- content/docs/command-reference/pull.md | 4 ++-- content/docs/command-reference/push.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index d6e880f1d0..37205827ef 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -112,8 +112,8 @@ used to see what files `dvc pull` would download. - `--run-cache` - downloads all available history of [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from - the remote repository (to the cache only, like `dvc fetch --run-cache`). Note - that `dvc repro ` is necessary to checkout these files (into the + remote storage (to the cache only, like `dvc fetch --run-cache`). Note that + `dvc repro ` is necessary to checkout these files (into the workspace) and update `dvc.lock`. - `-j `, `--jobs ` - parallelism level for DVC to download data diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index e407c1fe34..5e890ebe70 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -90,7 +90,7 @@ in the cache (compared to the default remote.) It can be used to see what files - `--run-cache` - uploads all available history of [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to - the remote repository. + remote storage. - `-j `, `--jobs ` - parallelism level for DVC to upload data to remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the From c9ed4c63c2338abe662da0901cc506ca129c713e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 1 Mar 2021 03:59:25 -0600 Subject: [PATCH 5/7] ref: updates per exp push/pull --- content/docs/command-reference/pull.md | 2 +- content/docs/command-reference/push.md | 2 +- content/docs/command-reference/repro.md | 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index 37205827ef..8eaf1aad3c 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -112,7 +112,7 @@ used to see what files `dvc pull` would download. - `--run-cache` - downloads all available history of [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from - remote storage (to the cache only, like `dvc fetch --run-cache`). Note that + the `dvc remote` (to the cache only, like `dvc fetch --run-cache`). Note that `dvc repro ` is necessary to checkout these files (into the workspace) and update `dvc.lock`. diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 5e890ebe70..6434936474 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -90,7 +90,7 @@ in the cache (compared to the default remote.) It can be used to see what files - `--run-cache` - uploads all available history of [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to - remote storage. + the `dvc remote`. - `-j `, `--jobs ` - parallelism level for DVC to upload data to remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index 810b31b82f..e03ca6b10f 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -182,7 +182,8 @@ up-to-date and only execute the final stage. - `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs involved in the stages being reproduced, if they are found in the [default remote storage](/doc/command-reference/remote/default). Note that it - tries the local run-cache first (unless `--no-run-cache` is also used). + includes pulling any available run-cache (unless `--no-run-cache` is also + used). - `-h`, `--help` - prints the usage/help message, and exit. From fbc2be4e9e507cfb1e5bdae79284d469541e207d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 13 Mar 2021 22:14:05 -0700 Subject: [PATCH 6/7] ref: update repro --pull desc per https://github.com/iterative/dvc.org/pull/2242#pullrequestreview-602583329 --- content/docs/command-reference/repro.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index e03ca6b10f..697ae46a5f 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -179,11 +179,10 @@ up-to-date and only execute the final stage. corresponding pipelines, including the target stages themselves. This option has no effect if `targets` are not provided. -- `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs - involved in the stages being reproduced, if they are found in the - [default remote storage](/doc/command-reference/remote/default). Note that it - includes pulling any available run-cache (unless `--no-run-cache` is also - used). +- `--pull` - downloads dependencies and outputs in the stages being reproduced + from the [default remote storage](/doc/command-reference/remote/default) (see + `dvc pull`) based on the run-cache. Note that this doesn't include initial + pipeline data sources (never found in the run-cache). - `-h`, `--help` - prints the usage/help message, and exit. From 504080d8cb73b2de5095a2bbc8425b9a1eb7a19d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 14 Mar 2021 17:44:49 -0600 Subject: [PATCH 7/7] ref: update repro --pull per https://github.com/iterative/dvc.org/pull/2242#pullrequestreview-611662116 --- content/docs/command-reference/repro.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index 697ae46a5f..0f17391847 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -179,10 +179,11 @@ up-to-date and only execute the final stage. corresponding pipelines, including the target stages themselves. This option has no effect if `targets` are not provided. -- `--pull` - downloads dependencies and outputs in the stages being reproduced - from the [default remote storage](/doc/command-reference/remote/default) (see - `dvc pull`) based on the run-cache. Note that this doesn't include initial - pipeline data sources (never found in the run-cache). +- `--pull` - attempts to download outputs of stages found in the + [run-cache](/doc/user-guide/project-structure/internal-files#run-cache) during + reproduction. Uses the + [default remote storage](/doc/command-reference/remote/default). See also + `dvc pull` - `-h`, `--help` - prints the usage/help message, and exit.