diff --git a/content/docs/api-reference/get_url.md b/content/docs/api-reference/get_url.md
index 0cdb531271..27aed63526 100644
--- a/content/docs/api-reference/get_url.md
+++ b/content/docs/api-reference/get_url.md
@@ -36,7 +36,7 @@ URL returned depends on the
`remote` used (see the [Parameters](#parameters) section).
If the target is a directory, the returned URL will end in `.dir`. Refer to
-[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
and `dvc add` to learn more about how DVC handles data directories.
⚠️ This function does not check for the actual existence of the file or
diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md
index 75501d2eb9..1173c78b41 100644
--- a/content/docs/command-reference/add.md
+++ b/content/docs/command-reference/add.md
@@ -38,7 +38,7 @@ each one:
1. Calculate the file hash.
2. Move the file contents to the cache (by default in `.dvc/cache`), using the
file hash to form the cached file path. (See
- [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+ [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.)
3. Attempt to replace the file with a link to the cached data (more details on
file linking further down).
@@ -71,7 +71,7 @@ large files. DVC also supports other link types for use on file systems without
`reflink` support, but they have to be specified manually. Refer to the
`cache.type` config option in `dvc config cache` for more information.
-### Tracking directories
+### Adding entire directories
A `dvc add` target can be an individual file or a directory. In the latter case,
a `.dvc` file is created for the top of the directory (with default name
@@ -83,9 +83,13 @@ in the directory tree. Instead, the single `.dvc` file references a special JSON
file in the cache (with `.dir` extension), that in turn points to the added
files.
-Note that DVC commands that use tracked files support granular targeting of
-files, even when the directory is added as a whole. Examples: `dvc push`,
-`dvc pull`, `dvc get`, `dvc import`, etc.
+> Refer to
+> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
+> for more info. on `.dir` cache entries.
+
+Note that DVC commands that use tracked data support granular targeting of files
+and directories, even when contained in a parent directory added as a whole.
+Examples: `dvc push`, `dvc pull`, `dvc get`, `dvc import`, etc.
As a rarely needed alternative, the `--recursive` option causes every file in
the hierarchy to be added individually. A corresponding `.dvc` file will be
@@ -192,9 +196,8 @@ outs:
path: pics
```
-> Refer to
-> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
-> for more info.
+> Refer to [Adding entire directories](#adding-entire-directories) for more
+> info.
This allows us to treat the entire directory structure as a single data
artifact. For example, you can pass the whole directory tree as a
diff --git a/content/docs/command-reference/cache/dir.md b/content/docs/command-reference/cache/dir.md
index cc8bc49ddc..d2c64e119d 100644
--- a/content/docs/command-reference/cache/dir.md
+++ b/content/docs/command-reference/cache/dir.md
@@ -17,7 +17,7 @@ positional arguments:
## Description
Helper to set the `cache.dir` configuration option. (See
-[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).)
+[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).)
Unlike doing so with `dvc config cache`, this command transform paths (`value`)
that are provided relative to the current working directory into paths
**relative to the config file location**. However, if the `value` provided is an
diff --git a/content/docs/command-reference/config.md b/content/docs/command-reference/config.md
index b683cec054..9c5af7052f 100644
--- a/content/docs/command-reference/config.md
+++ b/content/docs/command-reference/config.md
@@ -99,7 +99,7 @@ remote. See `dvc remote` for more information.
A DVC project cache is the hidden storage (by default located in
the `.dvc/cache` directory) for files that are tracked by DVC, and their
different versions. (See `dvc cache` and
-[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.) This section contains the following options:
- `cache.dir` - set/unset cache directory location. A correct value must be
diff --git a/content/docs/command-reference/fetch.md b/content/docs/command-reference/fetch.md
index 36d3194ff8..af3abf8509 100644
--- a/content/docs/command-reference/fetch.md
+++ b/content/docs/command-reference/fetch.md
@@ -189,7 +189,7 @@ $ tree .dvc
Note that the `.dvc/cache` directory was created and populated.
> Refer to
-> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.
Used without arguments (as above), `dvc fetch` downloads all assets needed by
diff --git a/content/docs/command-reference/gc.md b/content/docs/command-reference/gc.md
index d3d17312d5..2fb92be3f2 100644
--- a/content/docs/command-reference/gc.md
+++ b/content/docs/command-reference/gc.md
@@ -29,7 +29,7 @@ of commits (determined by reading the DVC-files in them). See the
[Options](#options) section for more details.
> Note that `dvc gc` tries to fetch any missing
-> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> from [remote storage](/doc/command-reference/remote) to the local
> cache, in order to know which files should exist inside cached
> directories. These files may be missing if the cache directory was previously
diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md
index 33ff39cc9b..e9168bb81b 100644
--- a/content/docs/command-reference/push.md
+++ b/content/docs/command-reference/push.md
@@ -194,7 +194,7 @@ Finally, we used `dvc status` to double check that all data had been uploaded.
## Example: What happens in the cache?
Let's take a detailed look at what happens to the
-[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
as you run an experiment locally and push data to remote storage. To set the
example consider having created a workspace that contains some code
and data, and having set up a remote.
@@ -242,7 +242,7 @@ the cache having more files in it than the remote – which is what the `new`
state means.
> Refer to
-> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.
Next we can copy the remaining data from the cache to the remote using
diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md
index e94a797c6b..7746a9cfdf 100644
--- a/content/docs/command-reference/run.md
+++ b/content/docs/command-reference/run.md
@@ -73,23 +73,38 @@ $ dvc run -n printer -d write.sh -o pages ./write.sh
$ dvc run -n scanner -d read.sh -d pages -o signed.pdf ./read.sh
```
+Stage dependencies can be any file or directory, either untracked, or more
+commonly tracked by DVC or Git. Outputs will be tracked and cached
+by DVC when the stage is run. Every output version will be cached when the stage
+is reproduced (see also `dvc gc`).
+
Relevant notes:
-- Typically, scripts being run (or a directory containing the source code) are
- included among the specified `-d` dependencies. This ensures that when the
- source code changes, DVC knows that the stage needs to be reproduced. (You can
- chose whether to do this.)
+- Typically, scripts being run (or possibly a directory containing the source
+ code) are included among the specified `-d` dependencies. This ensures that
+ when the source code changes, DVC knows that the stage needs to be reproduced.
+ (You can chose whether to do this.)
- `dvc run` checks the dependency graph integrity before creating a new stage.
- For example: two stage cannot explicitly specify the same output, there should
- be no cycles, etc.
+ For example: two stage cannot specify the same output or overlapping output
+ paths, there should be no cycles, etc.
- DVC does not feed dependency files to the command being run. The program will
have to read by itself the files specified with `-d`.
-- Outputs are deleted from the workspace before executing the
- command (including at `dvc repro`), so it should be able to recreate any
- directories marked as outputs.
+- Entire directories produced by the stage can be tracked as outputs by DVC,
+ which generates a single `.dir` entry in the cache (refer to
+ [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
+ for more info.)
+
+- [external dependencies](/doc/user-guide/external-dependencies) and
+ [external outputs](/doc/user-guide/managing-external-data) (outside of the
+ workspace) are also supported.
+
+- Outputs are deleted from the workspace before executing the command (including
+ at `dvc repro`) if their paths are found as existing files/directories. This
+ also means that the stage command needs to recreate any directory structures
+ defined as outputs every time its executed by DVC.
### For displaying and comparing data science experiments
@@ -117,7 +132,7 @@ systems and require certain software packages to be installed.
Wrap the command with double quotes `"` if there are special characters in it
like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
-`dvc run` as a whole. Use single quotes `'` instead if there are environment
+`dvc run` itself. Use single quotes `'` instead if there are environment
variables in it that should be evaluated dynamically. Examples:
```dvc
diff --git a/content/docs/user-guide/basic-concepts/dvc-cache.md b/content/docs/user-guide/basic-concepts/dvc-cache.md
index b1afec5846..1d080775f4 100644
--- a/content/docs/user-guide/basic-concepts/dvc-cache.md
+++ b/content/docs/user-guide/basic-concepts/dvc-cache.md
@@ -6,4 +6,4 @@ match: ['DVC cache', cache, caches, cached]
The DVC cache is a hidden storage (by default located in the `.dvc/cache`
directory) for files that are under DVC control, and their different versions.
For more details, please refer to this
-[document](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).
+[document](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).
diff --git a/content/docs/user-guide/dvc-files-and-directories.md b/content/docs/user-guide/dvc-files-and-directories.md
index d5e78bfea8..48b9c3a34d 100644
--- a/content/docs/user-guide/dvc-files-and-directories.md
+++ b/content/docs/user-guide/dvc-files-and-directories.md
@@ -236,7 +236,7 @@ separately under `params`, grouped by parameters file.
hand or with the command `dvc config --local`.
- `.dvc/cache`: The cache directory will store your data in a
- special [structure](#structure-of-cache-directory). The data files and
+ special [structure](#structure-of-the-cache-directory). The data files and
directories in the workspace will only contain links to the data
files in the cache. (Refer to
[Large Dataset Optimization](/doc/user-guide/large-dataset-optimization). See
@@ -277,7 +277,7 @@ separately under `params`, grouped by parameters file.
dependencies and outputs, to allow safely running multiple DVC commands in
parallel
-## Structure of cache directory
+## Structure of the cache directory
There are two ways in which the data is stored in cache: As a
single file (eg. `data.csv`), or a directory of files.
diff --git a/content/docs/user-guide/dvcignore.md b/content/docs/user-guide/dvcignore.md
index eceef9dff1..5fc2ca3291 100644
--- a/content/docs/user-guide/dvcignore.md
+++ b/content/docs/user-guide/dvcignore.md
@@ -85,7 +85,7 @@ Only the hash values of a directory (`data/`) and one file have been
(`data1`).
> Refer to
-> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
+> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.
Now, let's modify file `data1` and see if it affects `dvc status`.
diff --git a/content/docs/user-guide/setup-google-drive-remote.md b/content/docs/user-guide/setup-google-drive-remote.md
index d8d7d42b50..2c874bd89f 100644
--- a/content/docs/user-guide/setup-google-drive-remote.md
+++ b/content/docs/user-guide/setup-google-drive-remote.md
@@ -18,10 +18,13 @@ to establish GDrive remote connections (e.g. CI/CD).
## Quick start
To start using a Google Drive remote, you only need to add it with a
-[valid URL format](#url-format). Then use any DVC command that needs it (e.g.
-`dvc pull`, `dvc fetch`, `dvc push`). For example:
+[valid URL format](#url-format). Then use any DVC command that needs to connect
+to it (e.g. `dvc pull` or `dvc push` once there's tracked data to synchronize).
+For example:
```dvc
+$ dvc add data
+...
$ dvc remote add --default myremote \
gdrive://0AIac4JZqHhKmUk9PDA/dvcstore
$ dvc push
@@ -192,9 +195,10 @@ authentication is needed.
## Authorization
On the first usage of a GDrive [remote](/doc/command-reference/remote), for
-example when trying to `dvc push` for the first time after adding the remote
-with a [valid URL](#url-format), DVC will prompt you to visit a special Google
-authentication web page. There you'll need to sign into your Google account. The
+example when trying to `dvc push` tracked data for the first time, DVC will
+prompt you to visit a special Google authentication web page. There you'll need
+to sign into a Google account with the needed access to the GDrive
+[URL](#url-format) in question. The
[auth process](https://developers.google.com/drive/api/v2/about-auth) will ask
you to grant DVC the necessary permissions, and produce a verification code
needed for DVC to complete the connection. On success, the necessary credentials