diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index 08f673c665..3ae69324ac 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -1,7 +1,7 @@ # dag -Visualize the pipeline(s) in `dvc.yaml` as one or more graph(s) of connected -[stages](/doc/command-reference/run). +Visualize the pipeline(s) in `dvc.yaml` as one or more graph(s) of +connected [stages](/doc/command-reference/run). ## Synopsis @@ -15,6 +15,11 @@ positional arguments: ## Description +Displays the stages of a pipeline up to the `target` stage. If omitted, it will +show the full project DAG. + +### Directed acyclic graph + A data pipeline, in general, is a series of data processing [stages](/doc/command-reference/run) (for example, console commands that take an input and produce an outcome). The connections between stages are formed by the @@ -33,29 +38,7 @@ restore one or more pipelines later (see `dvc repro`). > DVC builds a dependency graph > ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this. -The `dvc dag` command displays the stages of a pipeline up to the target stage. -If `target` is omitted, it will show the full project DAG. - -## Options - -- `--full` - show full DAG that the `target` stage belongs to, instead of - showing only its ancestors. - -- `--dot` - show DAG in - [DOT]() - format. It can be passed to third party visualization utilities. - -- `-o`, `--outs` - show a DAG of chained dependencies and outputs instead of the - stages themselves. The graph may be significantly different. - -- `-h`, `--help` - prints the usage/help message, and exit. - -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no - problems arise, otherwise 1. - -- `-v`, `--verbose` - displays detailed tracing information. - -## Paginating the output +### Paginating the output This command's output is automatically piped to [less]() if available in the terminal @@ -84,6 +67,25 @@ example in Bash, we could add the following line to `~/.bashrc`: export DVC_PAGER=more ``` +## Options + +- `--full` - show full DAG that the `target` stage belongs to, instead of + showing only its ancestors. + +- `--dot` - show DAG in + [DOT]() + format. It can be passed to third party visualization utilities. + +- `-o`, `--outs` - show a DAG of chained dependencies and outputs instead of the + stages themselves. The graph may be significantly different. + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. + ## Example: Visualize a DVC Pipeline Visualize the prepare, featurize, train, and evaluate stages of a pipeline as diff --git a/content/docs/command-reference/gc.md b/content/docs/command-reference/gc.md index 3208e5bb62..56b349d3e4 100644 --- a/content/docs/command-reference/gc.md +++ b/content/docs/command-reference/gc.md @@ -13,27 +13,24 @@ usage: dvc gc [-h] [-q | -v] [-w] [-a] [-T] [--all-commits] ## Description -This command deletes (garbage collects) data files or directories that exist in -DVC cache but are no longer needed. With `--cloud` it also removes data in +This command can delete (garbage collect) data files or directories that exist +in the cache but are no longer needed. With `--cloud`, it also removes data in [remote storage](/doc/command-reference/remote). -To avoid accidentally deleting data, it raises an error and doesn't touch any -files if no scope options are provided. It means it's user's responsibility to -explicitly provide the right set of options to specify what data is still needed -(so that DVC can figure out what files can be safely deleted). +To avoid accidentally deleting data, `dvc gc` doesn't do anything unless one or +a combination of scope options are provided (`--workspace`, `--all-branches`, +`--all-tags`, `--all-commits`). Use these to indicate which cached files are +still needed. See the [Options](#options) section for more details. -One of the scope options (`--workspace`, `--all-branches`, `--all-tags`, -`--all-commits`, `--all-experiments`) or a combination of them must be provided. -Each of them corresponds to keeping the data for the current workspace, and for -a certain set of commits (determined by reading the DVC files in -them). See the [Options](#options) section for more details. +The data kept is determined by reading the DVC files in the set of +commits of the given scope. > Note that `dvc gc` tries to fetch any missing > [`.dir` files](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) > from [remote storage](/doc/command-reference/remote) to the local > cache, in order to determine which files should exist inside > cached directories. These files may be missing if the cache directory was -> previously garbage collected, in a newly cloned copy of the repo, etc. +> previously garbage collected, or in a newly cloned copy of the repo, etc. Unless the `--cloud` option is used, `dvc gc` does not remove data files from any remote. This means that any files collected from the local cache can be diff --git a/content/docs/command-reference/metrics/diff.md b/content/docs/command-reference/metrics/diff.md index 4c89cc9b25..c5c58af6b4 100644 --- a/content/docs/command-reference/metrics/diff.md +++ b/content/docs/command-reference/metrics/diff.md @@ -3,6 +3,8 @@ Compare [metrics](/doc/command-reference/metrics) between two commits in the DVC repository, or between a commit and the workspace. +> Requires that Git is being used to version the project. + ## Synopsis ```usage @@ -20,26 +22,29 @@ positional arguments: ## Description -This command provides a quick way to compare metrics among experiments in the -repository history. The differences shown by this command include the new value, -and numeric difference (delta) from the previous value of metrics (rounded to 5 +Provides a quick way to compare metrics among experiments in the repository +history. The differences shown by this command include the new value, and +numeric difference (delta) from the previous value of metrics (rounded to 5 digits precision). -`a_rev` and `b_rev` are Git commit hashes, tag, or branch names. If none are -specified, `dvc metrics diff` compares metrics currently present in the +Without arguments, `dvc metrics diff` compares metrics currently present in the workspace (uncommitted changes) with the latest committed versions -(required). A single specified revision results in comparing the workspace and -that version. +(required). Only metrics that changed are listed, by default (show everything +with `--all`). -All metrics defined in `dvc.yaml` are used by default, but specific metrics -files can be specified with the `--targets` option +`a_rev` and `b_rev` are optional Git commit hashes, tags, or branch names to +compare. A single specified revision results in comparing it against the +workspace. > Note that targets don't necessarily have to be defined in `dvc.yaml`. For that > reason, this command doesn't require an existing DVC project to run in; It > works in any Git repo. -Another way to display metrics is the `dvc metrics show` command, which just -lists all the current metrics, without comparisons. +All metrics defined in `dvc.yaml` are used by default, but specific metrics +files can be specified with the `--targets` option. + +Another way to display metrics is the `dvc metrics show` command, which lists +all the current metrics (without comparisons). ## Options @@ -61,20 +66,21 @@ lists all the current metrics, without comparisons. - `--all` - list all metrics, including those without changes. -- `--show-json` - prints the command's output in easily parsable JSON format, +- `--show-json` - prints the command's output in JSON format (machine-readable) instead of a human-readable table. -- `--show-md` - prints the command's output in Markdown table format. +- `--show-md` - prints the command's output in the Markdown table format + ([GFM](https://github.github.com/gfm/#tables-extension-)). -- `--old` - show old metric value in addition to the new value. +- `--old` - include the "Old" value column in addition to the new "Value" (and + "Change"). Useful when the values are not numeric, for example -- `--no-path` - don't show metric path in the result table. This option is - useful when only one metrics file is in use or there is no intersection - between the metric names. +- `--no-path` - hide the "Path" column that lists the param/metrics file + location. Useful when only one metrics file exists, for example - `--precision ` - - [round](https://docs.python.org/3/library/functions.html#round) metrics to `n` - digits precision after the decimal point. Rounds to 5 digits by default. + [round](https://docs.python.org/3/library/functions.html#round) decimal values + to `n` digits of precision (5 by default). - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/content/docs/command-reference/params/diff.md b/content/docs/command-reference/params/diff.md index c8244ccc9c..ee3cbdceda 100644 --- a/content/docs/command-reference/params/diff.md +++ b/content/docs/command-reference/params/diff.md @@ -4,6 +4,8 @@ Show changes in [parameters](/doc/command-reference/params) between commits in the DVC repository, or between a commit and the workspace. +> Requires that Git is being used to version the project. + ## Synopsis ```usage @@ -20,21 +22,28 @@ positional arguments: ## Description Provides a quick way to compare parameter values among experiments in the -repository history. Requires that Git is being used to version the project -params. +repository history. The differences shown by this command include the old and +new param values, along with the param name. > Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g. > with the the `-p` (`--params`) option of `dvc run`). -Without arguments, this command compares parameters currently present in the -workspace (uncommitted changes) with the latest committed version. -This includes everything in `params.yaml` (default parameters file) as well all -the `params` used in `dvc.yaml`. Values in `dvc.lock` are used for comparison. -Only params that have changes are listed. +Without arguments, `dvc params diff` compares parameters currently present in +the workspace (uncommitted changes) with the latest committed +versions (required). This includes everything in `params.yaml` (default +parameters file) as well all the `params` used in `dvc.yaml`. Values in +`dvc.lock` are used for comparison. Only params that have changes are listed. + +`a_rev` and `b_rev` are optional Git commit hashes, tags, or branch names to +compare. A single specified revision results in comparing it against the +workspace. + +All params defined in `dvc.yaml` are used by default, but specific ones can be +specified with the `--targets` option. -> Note that unlike `dvc diff`, this command doesn't always need DVC files to -> find params files (see `--targets` option). For that reason, it doesn't -> require an existing DVC project to run in. It can work in any Git repo. +> Note that targets don't necessarily have to be defined in `dvc.yaml`. For that +> reason, it doesn't require an existing DVC project to run in. It can work in +> any Git repo. ## Options diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md index 1620e742da..cdea58d2f8 100644 --- a/content/docs/command-reference/plots/diff.md +++ b/content/docs/command-reference/plots/diff.md @@ -27,7 +27,7 @@ versions of the repository, by overlaying them in a single plot. > Note that unlike `dvc metrics diff`, this command does not calculate numeric > differences between plots file values. -`revisions` are Git commit hashes, tag, or branch names. If none are specified, +`revisions` are Git commit hashes, tags, or branch names. If none are specified, `dvc plots diff` compares plots currently present in the workspace (uncommitted changes) with their latest commit (required). A single specified revision results in comparing the workspace and that version. diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index 0a48356aa3..40c477e48f 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -79,8 +79,8 @@ used to see what files `dvc pull` would download. - `-a`, `--all-branches` - determines the files to download by examining `dvc.yaml` and `.dvc` files in all Git branches instead of just those present in the current workspace. It's useful if branches are used to track - experiments or project checkpoints. Note that this can be combined with `-T` - below, for example using the `-aT` flag. + experiments. Note that this can be combined with `-T` below, for example using + the `-aT` flag. - `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as the workspace. Useful if tags are used to mark certain versions of an @@ -112,8 +112,8 @@ used to see what files `dvc pull` would download. - `--run-cache` - downloads all available history of [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from - the remote repository (to the cache only, like `dvc fetch --run-cache`). Note - that `dvc repro ` is necessary to checkout these files (into the + the `dvc remote` (to the cache only, like `dvc fetch --run-cache`). Note that + `dvc repro ` is necessary to checkout these files (into the workspace) and update `dvc.lock`. - `-j `, `--jobs ` - parallelism level for DVC to download data diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 65fd831904..6bc86f93bd 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -62,8 +62,8 @@ in the cache (compared to the default remote.) It can be used to see what files - `-a`, `--all-branches` - determines the files to upload by examining `dvc.yaml` and `.dvc` files in all Git branches instead of just those present in the current workspace. It's useful if branches are used to track - experiments or project checkpoints. Note that this can be combined with `-T` - below, for example using the `-aT` flag. + experiments. Note that this can be combined with `-T` below, for example using + the `-aT` flag. - `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as the workspace. Useful if tags are used to mark certain versions of an @@ -90,7 +90,7 @@ in the cache (compared to the default remote.) It can be used to see what files - `--run-cache` - uploads all available history of [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to - the remote repository. + the `dvc remote`. - `-j `, `--jobs ` - parallelism level for DVC to upload data to remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index 810b31b82f..0f17391847 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -179,10 +179,11 @@ up-to-date and only execute the final stage. corresponding pipelines, including the target stages themselves. This option has no effect if `targets` are not provided. -- `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs - involved in the stages being reproduced, if they are found in the - [default remote storage](/doc/command-reference/remote/default). Note that it - tries the local run-cache first (unless `--no-run-cache` is also used). +- `--pull` - attempts to download outputs of stages found in the + [run-cache](/doc/user-guide/project-structure/internal-files#run-cache) during + reproduction. Uses the + [default remote storage](/doc/command-reference/remote/default). See also + `dvc pull` - `-h`, `--help` - prints the usage/help message, and exit.