diff --git a/content/docs/command-reference/cache/index.md b/content/docs/command-reference/cache/index.md
index 07c8fb54c6..f9c1bc91f4 100644
--- a/content/docs/command-reference/cache/index.md
+++ b/content/docs/command-reference/cache/index.md
@@ -15,16 +15,15 @@ positional arguments:
## Description
-At DVC initialization, a new `.dvc/` directory is created for internal
-configuration and cache
-[files and directories](/doc/user-guide/dvc-files-and-directories#internal-directories-and-files),
-that are hidden from the user.
-
-The cache is where your data files, models, etc. (anything you want to version
-with DVC) are actually stored. The corresponding files you see in the
-workspace can simply link to the ones in cache. (Refer to
-[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
-for more information on file links on different platforms.)
+The DVC Cache is where your data files, models, etc. (anything you want to
+version with DVC) are actually stored. The data files and directories visible in
+the workspace are links\* to (or copies of) the ones in cache.
+Learn more about it's
+[structure](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).
+
+> \* Refer to
+> [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
+> for more information on file links on different platforms.
> For more cache-related configuration options refer to `dvc config cache`.
diff --git a/content/docs/command-reference/fetch.md b/content/docs/command-reference/fetch.md
index 8a618ee79c..83725c165e 100644
--- a/content/docs/command-reference/fetch.md
+++ b/content/docs/command-reference/fetch.md
@@ -35,7 +35,7 @@ Fetching is performed automatically by `dvc pull` (when the data is not already
in the cache), along with `dvc checkout`:
```
-Controlled files Commands
+Tracked files Commands
---------------- ---------------------------------
remote storage
@@ -277,4 +277,4 @@ into the workspace (with `dvc repro train.dvc`).
> Note that in this example project, the last stage file `evaluate.dvc` doesn't
> add any more data files than those form previous stages, so at this point all
> of the data for this pipeline is cached and `dvc status -c` would output
-> `Data and pipelines are up to date.`
+> `Cache and remote 'myremote' are in sync.`
diff --git a/content/docs/command-reference/init.md b/content/docs/command-reference/init.md
index d4aad4fb53..6a3cb03e46 100644
--- a/content/docs/command-reference/init.md
+++ b/content/docs/command-reference/init.md
@@ -116,11 +116,11 @@ In rare cases, the `--no-scm` option might be desirable: to initialize DVC in a
directory that is not part of a Git repo, or to make DVC ignore Git. Examples
include:
-- Version control other than Git is being used. Even though there are DVC
- features that require DVC to be run in the Git repo, DVC can work well with
- other version control systems. Since DVC relies on simple `dvc.yaml` files to
- manage pipelines, data, etc, they can be added into any version
- control system, thus providing large data files and directories versioning.
+- SCM other than Git is being used. Even though there are DVC features that
+ require DVC to be run in the Git repo, DVC can work well with other version
+ control systems. Since DVC relies on simple `dvc.yaml` files to manage
+ pipelines, data, etc, they can be added into any version control
+ system, thus providing large data files and directories versioning.
- There is no need to keep the history at all, e.g. having a deployment
automation like running a data pipeline using `cron`.
diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md
index d0bf5504ab..5bcd21353d 100644
--- a/content/docs/command-reference/plots/diff.md
+++ b/content/docs/command-reference/plots/diff.md
@@ -19,9 +19,9 @@ positional arguments:
## Description
-This command is a way to visualize the "difference" between metrics among
-experiments in the repository history, by plotting multiple
-versions of the metrics. All plots defined in `dvc.yaml` are used by default.
+This command is a way to visualize the "difference" between
+[certain metrics](/doc/command-reference/plots#supported-file-formats) among
+versions of the repository, by overlaying them in a single plot.
> Note that unlike `dvc metrics diff`, this command does not calculate numeric
> differences between metrics file values.
@@ -34,8 +34,9 @@ revision results in comparing the workspace and that version.
💡 Note that any number of `revisions` can be provided, and the resulting plot
shows all of them in a single image.
-Specific plots files can be specified with the `--targets` option. Note that
-these don't have to be defined as `plots` in `dvc.yaml`.
+All plots defined in `dvc.yaml` are used by default, but specific plots files
+can be specified with the `--targets` option (note that targets don't
+necessarily have to be defined in `dvc.yaml`).
The plot style can be customized with
[plot templates](/doc/command-reference/plots#plot-templates), using the
diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md
index 7c1616fe9c..9a9af955bf 100644
--- a/content/docs/command-reference/plots/index.md
+++ b/content/docs/command-reference/plots/index.md
@@ -29,15 +29,13 @@ learning training or data processing:
## Description
-DVC provides a set of commands to visualize metrics of machine learning
-experiments. Usual plot examples are AUC curves, loss functions, confusion
-matrices, among others.
+DVC provides a set of commands to visualize certain metrics of machine learning
+experiments as plots. Usual plot examples are AUC curves, loss functions,
+confusion matrices, among others.
This type of metrics files are created by users, or generated by user data
-processing code, and get defined with the `-p` (`--plots`) and
-`--plots-no-cache`) options of `dvc run`. `dvc plots` subcommands can work with
-plots files committed to a Git repo history, data files controlled by DVC, or
-any other file in system.
+processing code, and can be defined in `dvc.yaml` (`plots` field) for tracking
+(optional).
DVC generates plots as HTML files that can be open with a web browser. These
HTML files use [Vega-Lite](https://vega.github.io/vega-lite/). Vega is a
diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md
index 326fb34444..441380cb5e 100644
--- a/content/docs/command-reference/plots/show.md
+++ b/content/docs/command-reference/plots/show.md
@@ -18,12 +18,13 @@ positional arguments:
## Description
-This command provides a quick way to visualize metrics such as loss functions,
-AUC curves, confusion matrices, etc. All plots defined in `dvc.yaml` are used by
-default.
+This command provides a quick way to visualize
+[certain metrics](/doc/command-reference/plots#supported-file-formats) such as
+loss functions, AUC curves, confusion matrices, etc.
-Optionally, specific metric file `targets` to show are accepted. Note that these
-don't have to be defined as `plots` in `dvc.yaml`.
+All plots defined in `dvc.yaml` are used by default, but specific plots files
+can be specified as `targets` (note that targets don't necessarily have to be
+defined in `dvc.yaml`).
The plot style can be customized with
[plot templates](/doc/command-reference/plots#plot-templates), using the
diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md
index bdf334dd51..7ba525ab0c 100644
--- a/content/docs/command-reference/pull.md
+++ b/content/docs/command-reference/pull.md
@@ -37,7 +37,7 @@ to `dvc config cache.type`).
It has the same effect as running `dvc fetch` and `dvc checkout`:
```
-Controlled files Commands
+Tracked files Commands
---------------- ---------------------------------
remote storage
@@ -112,8 +112,9 @@ used to see what files `dvc pull` would download.
`dvc remote list`).
- `--run-cache` - downloads all available history of stage runs from the remote
- repository into the local run-cache. A `dvc repro ` is necessary
- to checkout these files into the workspace and update the `dvc.lock` file.
+ repository (to the cache only, like `dvc fetch --run-cache`). Note that
+ `dvc repro ` is necessary to checkout these files (into the
+ workspace) and update `dvc.lock`.
- `-j `, `--jobs ` - parallelism level for DVC to download data
from remote storage. This only applies when the `--cloud` option is used, or a
diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md
index 72cbae4a47..b8e045949b 100644
--- a/content/docs/command-reference/push.md
+++ b/content/docs/command-reference/push.md
@@ -168,7 +168,7 @@ $ dvc push --with-deps matrix-train
... Push the rest of the data
$ dvc status --cloud
-Data and pipelines are up to date.
+Cache and remote 'r1' are in sync.
```
We specified a stage in the middle of this pipeline (`test-posts`) with the
@@ -259,7 +259,7 @@ $ tree ~/vault/recursive
10 directories, 10 files
$ dvc status --cloud
-Data and pipelines are up to date.
+Cache and remote 'r1' are in sync.
```
And running `dvc status --cloud`, DVC verifies that indeed there are no more
diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md
index 21bd84b763..37fc319574 100644
--- a/content/docs/command-reference/repro.md
+++ b/content/docs/command-reference/repro.md
@@ -39,7 +39,7 @@ and caches relevant data artifacts along the way.
needed after a `git commit`. See `dvc install` for more details.
`dvc repro` does not run `dvc fetch`, `dvc pull` or `dvc checkout` to get data
-files, intermediate or final results.
+files, intermediate or final results (except if the `--pull` option is used).
By default, this command checks all pipeline stages to determine which ones have
changed. Then it executes the corresponding commands. Outputs are
@@ -135,7 +135,7 @@ up-to-date and only execute the final stage.
present in the DVC project.
- `--no-run-cache` - execute stage commands even if they have already been run
- with the same command/dependencies/outputs/etc before.
+ with the same dependencies/outputs/etc. before.
- `--force-downstream` - in cases like `... -> A (changed) -> B -> C` it will
reproduce `A` first and then `B`, even if `B` was previously executed with the
@@ -157,8 +157,12 @@ up-to-date and only execute the final stage.
corresponding pipelines, including the target stages themselves. This option
has no effect if `targets` are not provided.
-- `--pull` - try automatically [pulling](/doc/command-reference/pull) missing
- cache for outputs restored from run-cache.
+- `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs
+ involved in the stages being reproduced, if they are found in the
+ [default](/doc/command-reference/remote/default) remote storage. Note that it
+ checks the local run-cache too (available history of stage runs).
+
+ > Has no effect if combined with `--no-run-cache`.
- `-h`, `--help` - prints the usage/help message, and exit.
diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md
index 6eef7e1605..e9e19633cf 100644
--- a/content/docs/command-reference/run.md
+++ b/content/docs/command-reference/run.md
@@ -170,9 +170,8 @@ $ dvc run -n my_stage './my_script.sh $MYENVVAR'
- `-O `, `--outs-no-cache ` - the same as `-o` except that outputs
are not tracked by DVC. It means that they are not cached, and it's up to a
- user to save and version control them. This is useful if the outputs are small
- enough to be tracked by Git directly, or if these files are not of future
- interest.
+ user to manage them separately. This is useful if the outputs are small enough
+ to be tracked by Git directly, or if these files are not of future interest.
- `--outs-persist ` - declare output file or directory that will not be
removed upon `dvc repro`.
@@ -197,9 +196,9 @@ $ dvc run -n my_stage './my_script.sh $MYENVVAR'
- `-M `, `--metrics-no-cache ` - the same as `-m` except that DVC
does not track the metrics file. This means that the file is not cached, so
- it's up to the user to save and version control it. This is typically
- desirable with _metrics_ because they are small enough to be tracked with Git
- directly. See also the difference between `-o` and `-O`.
+ it's up to the user to manage them separately. This is typically desirable
+ with _metrics_ because they are small enough to be tracked with Git directly.
+ See also the difference between `-o` and `-O`.
- `--plots ` - specify a plot metrics file produces by this stage. This
option behaves like `-o` but registers the file in a `plots` field inside the
@@ -210,8 +209,8 @@ $ dvc run -n my_stage './my_script.sh $MYENVVAR'
- `--plots-no-cache ` - the same as `--plots` except that DVC does not
track the plots metrics file. This means that the file is not cached, so it's
- up to the user to save and version control it. See also the difference between
- `-o` and `-O`.
+ up to the user to manage them separately. See also the difference between `-o`
+ and `-O`.
- `-w `, `--wdir ` - specifies a working directory for the `command`
to run in (uses the `wdir` field in `dvc.yaml`). Dependency and output files
@@ -231,10 +230,10 @@ $ dvc run -n my_stage './my_script.sh $MYENVVAR'
- `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without
asking for confirmation.
-- `--no-run-cache` - forcefully execute the `command` again, even if the same
- `dvc run` command has already been run in this workspace. Useful if the
- command's code is non-deterministic (meaning it produces different outputs
- from the same list of inputs).
+- `--no-run-cache` - execute the stage `command` even if it has already been run
+ with the same dependencies/outputs/etc. before. Useful for example if the
+ command's code is non-deterministic
+ ([not recommended](#avoiding-unexpected-behavior)).
- `--no-commit` - do not save outputs to cache. A stage created and an entry is
added to `.dvc/state`, while nothing is added to the cache. In the stage file,
diff --git a/content/docs/command-reference/status.md b/content/docs/command-reference/status.md
index 5b80d814ca..e3f2d26774 100644
--- a/content/docs/command-reference/status.md
+++ b/content/docs/command-reference/status.md
@@ -19,10 +19,10 @@ positional arguments:
## Description
-`dvc status` searches for changes in the existing tracked data and pipelines,
-either showing which files or directories have changed in the
-workspace and should be added or reproduced again (with `dvc add`
-or `dvc repro`); or differences between cache vs. remote storage
+Searches for changes in the existing tracked data and pipelines, either showing
+which files or directories have changed in the workspace and should
+be added or reproduced again (with `dvc add` or `dvc repro`); or differences
+between cache vs. [remote storage](/doc/command-reference/remote)
(implying `dvc push` or `dvc pull` should be run to synchronize them). The
_remote_ mode is triggered by using the `--cloud` or `--remote` options:
@@ -43,11 +43,12 @@ paths to tracked files or directories (including paths inside tracked
directories), `.dvc` files, and stage names (found in `dvc.yaml`).
If no differences are detected, `dvc status` prints
-`Data and pipelines are up to date.` If differences are detected by
-`dvc status`, the command output indicates the changes. For each stage with
-differences, the changes in dependencies and/or
-outputs that differ are listed. For each item listed, either the
-file name or hash is shown, along with a _state description_, as detailed below:
+`Data and pipelines are up to date.` or
+`Cache and remote 'myremote' are in sync` (if using the `-c` or `-r` options are
+used). If differences are detected, the changes in dependencies
+and/or outputs for each stage that differs are listed. For each
+item listed, either the file name or hash is shown, along with a _state
+description_, as detailed below:
### Local workspace status
diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json
index f86cf0295a..aed67a329e 100644
--- a/content/docs/sidebar.json
+++ b/content/docs/sidebar.json
@@ -71,7 +71,7 @@
"children": ["tutorial"]
},
{
- "label": "Sharing Data & Model Files",
+ "label": "Sharing Data and Model Files",
"slug": "sharing-data-and-model-files"
},
"shared-development-server",
diff --git a/content/docs/use-cases/index.md b/content/docs/use-cases/index.md
index bc316c479b..2eb692b461 100644
--- a/content/docs/use-cases/index.md
+++ b/content/docs/use-cases/index.md
@@ -18,9 +18,8 @@ knowledge, they are still difficult to implement, reuse, and manage.
If you store and process data files or datasets to produce other data or machine
learning models, and you want to
-- capture and save data artifacts the same way you capture code;
-- track, control, and switch between different versions of data or models
- easily;
+- track and save data artifacts the same way you capture code;
+- create and switch among different versions of data or models easily;
- understand how data or ML models were built in the first place;
- compare machine learning models and metrics to each other;
- bring software engineering best practices and tools to your data science team
diff --git a/content/docs/use-cases/versioning-data-and-model-files/tutorial.md b/content/docs/use-cases/versioning-data-and-model-files/tutorial.md
index 44588ad577..0810bf8d1a 100644
--- a/content/docs/use-cases/versioning-data-and-model-files/tutorial.md
+++ b/content/docs/use-cases/versioning-data-and-model-files/tutorial.md
@@ -11,7 +11,7 @@ to build a powerful image classifier using a pretty small dataset.
> We highly recommend reading the François' tutorial itself. It's a great
> demonstration of how a general pre-trained model can be leveraged to build a
-> new highly performant model, with very limited resources.
+> new high-performance model, with very limited resources.
We first train a classifier model using 1000 labeled images, then we double the
number of images (2000) and retrain our model. We capture both datasets and
diff --git a/content/docs/user-guide/basic-concepts/dvc-cache.md b/content/docs/user-guide/basic-concepts/dvc-cache.md
index 1d080775f4..49c0644100 100644
--- a/content/docs/user-guide/basic-concepts/dvc-cache.md
+++ b/content/docs/user-guide/basic-concepts/dvc-cache.md
@@ -4,6 +4,6 @@ match: ['DVC cache', cache, caches, cached]
---
The DVC cache is a hidden storage (by default located in the `.dvc/cache`
-directory) for files that are under DVC control, and their different versions.
-For more details, please refer to this
-[document](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).
+directory) for files that are tracked by DVC, and their different versions.
+Learn more about it's
+[structure](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).