Skip to content

Commit

Permalink
Merge pull request #1662 from iterative/jorge
Browse files Browse the repository at this point in the history
docs: misc updates
  • Loading branch information
jorgeorpinel authored Aug 5, 2020
2 parents 085ec59 + c02e81e commit 138acdf
Show file tree
Hide file tree
Showing 12 changed files with 56 additions and 34 deletions.
2 changes: 1 addition & 1 deletion content/docs/api-reference/get_url.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ URL returned depends on the
`remote` used (see the [Parameters](#parameters) section).

If the target is a directory, the returned URL will end in `.dir`. Refer to
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
and `dvc add` to learn more about how DVC handles data directories.

⚠️ This function does not check for the actual existence of the file or
Expand Down
19 changes: 11 additions & 8 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ each one:
1. Calculate the file hash.
2. Move the file contents to the cache (by default in `.dvc/cache`), using the
file hash to form the cached file path. (See
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.)
3. Attempt to replace the file with a link to the cached data (more details on
file linking further down).
Expand Down Expand Up @@ -71,7 +71,7 @@ large files. DVC also supports other link types for use on file systems without
`reflink` support, but they have to be specified manually. Refer to the
`cache.type` config option in `dvc config cache` for more information.

### Tracking directories
### Adding entire directories

A `dvc add` target can be an individual file or a directory. In the latter case,
a `.dvc` file is created for the top of the directory (with default name
Expand All @@ -83,9 +83,13 @@ in the directory tree. Instead, the single `.dvc` file references a special JSON
file in the cache (with `.dir` extension), that in turn points to the added
files.

Note that DVC commands that use tracked files support granular targeting of
files, even when the directory is added as a whole. Examples: `dvc push`,
`dvc pull`, `dvc get`, `dvc import`, etc.
> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info. on `.dir` cache entries.
Note that DVC commands that use tracked data support granular targeting of files
and directories, even when contained in a parent directory added as a whole.
Examples: `dvc push`, `dvc pull`, `dvc get`, `dvc import`, etc.

As a rarely needed alternative, the `--recursive` option causes every file in
the hierarchy to be added individually. A corresponding `.dvc` file will be
Expand Down Expand Up @@ -192,9 +196,8 @@ outs:
path: pics
```

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> for more info.
> Refer to [Adding entire directories](#adding-entire-directories) for more
> info.

This allows us to treat the entire directory structure as a single <abbr>data
artifact</abbr>. For example, you can pass the whole directory tree as a
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/cache/dir.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ positional arguments:
## Description

Helper to set the `cache.dir` configuration option. (See
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).)
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).)
Unlike doing so with `dvc config cache`, this command transform paths (`value`)
that are provided relative to the current working directory into paths
**relative to the config file location**. However, if the `value` provided is an
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ remote. See `dvc remote` for more information.
A DVC project <abbr>cache</abbr> is the hidden storage (by default located in
the `.dvc/cache` directory) for files that are tracked by DVC, and their
different versions. (See `dvc cache` and
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.) This section contains the following options:

- `cache.dir` - set/unset cache directory location. A correct value must be
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ $ tree .dvc
Note that the `.dvc/cache` directory was created and populated.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.
Used without arguments (as above), `dvc fetch` downloads all assets needed by
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/gc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ of commits (determined by reading the DVC-files in them). See the
[Options](#options) section for more details.

> Note that `dvc gc` tries to fetch any missing
> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> from [remote storage](/doc/command-reference/remote) to the local
> <abbr>cache</abbr>, in order to know which files should exist inside cached
> directories. These files may be missing if the cache directory was previously
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Finally, we used `dvc status` to double check that all data had been uploaded.
## Example: What happens in the cache?

Let's take a detailed look at what happens to the
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
as you run an experiment locally and push data to remote storage. To set the
example consider having created a <abbr>workspace</abbr> that contains some code
and data, and having set up a remote.
Expand Down Expand Up @@ -242,7 +242,7 @@ the cache having more files in it than the remote – which is what the `new`
state means.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.
Next we can copy the remaining data from the cache to the remote using
Expand Down
35 changes: 25 additions & 10 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,23 +73,38 @@ $ dvc run -n printer -d write.sh -o pages ./write.sh
$ dvc run -n scanner -d read.sh -d pages -o signed.pdf ./read.sh
```

Stage dependencies can be any file or directory, either untracked, or more
commonly tracked by DVC or Git. Outputs will be tracked and <abbr>cached</abbr>
by DVC when the stage is run. Every output version will be cached when the stage
is reproduced (see also `dvc gc`).

Relevant notes:

- Typically, scripts being run (or a directory containing the source code) are
included among the specified `-d` dependencies. This ensures that when the
source code changes, DVC knows that the stage needs to be reproduced. (You can
chose whether to do this.)
- Typically, scripts being run (or possibly a directory containing the source
code) are included among the specified `-d` dependencies. This ensures that
when the source code changes, DVC knows that the stage needs to be reproduced.
(You can chose whether to do this.)

- `dvc run` checks the dependency graph integrity before creating a new stage.
For example: two stage cannot explicitly specify the same output, there should
be no cycles, etc.
For example: two stage cannot specify the same output or overlapping output
paths, there should be no cycles, etc.

- DVC does not feed dependency files to the command being run. The program will
have to read by itself the files specified with `-d`.

- Outputs are deleted from the <abbr>workspace</abbr> before executing the
command (including at `dvc repro`), so it should be able to recreate any
directories marked as outputs.
- Entire directories produced by the stage can be tracked as outputs by DVC,
which generates a single `.dir` entry in the cache (refer to
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more info.)

- [external dependencies](/doc/user-guide/external-dependencies) and
[external outputs](/doc/user-guide/managing-external-data) (outside of the
<abbr>workspace</abbr>) are also supported.

- Outputs are deleted from the workspace before executing the command (including
at `dvc repro`) if their paths are found as existing files/directories. This
also means that the stage command needs to recreate any directory structures
defined as outputs every time its executed by DVC.

### For displaying and comparing data science experiments

Expand Down Expand Up @@ -117,7 +132,7 @@ systems and require certain software packages to be installed.

Wrap the command with double quotes `"` if there are special characters in it
like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
`dvc run` as a whole. Use single quotes `'` instead if there are environment
`dvc run` itself. Use single quotes `'` instead if there are environment
variables in it that should be evaluated dynamically. Examples:

```dvc
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/basic-concepts/dvc-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ match: ['DVC cache', cache, caches, cached]
The DVC cache is a hidden storage (by default located in the `.dvc/cache`
directory) for files that are under DVC control, and their different versions.
For more details, please refer to this
[document](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).
[document](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).
4 changes: 2 additions & 2 deletions content/docs/user-guide/dvc-files-and-directories.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ separately under `params`, grouped by parameters file.
hand or with the command `dvc config --local`.

- `.dvc/cache`: The <abbr>cache</abbr> directory will store your data in a
special [structure](#structure-of-cache-directory). The data files and
special [structure](#structure-of-the-cache-directory). The data files and
directories in the <abbr>workspace</abbr> will only contain links to the data
files in the cache. (Refer to
[Large Dataset Optimization](/doc/user-guide/large-dataset-optimization). See
Expand Down Expand Up @@ -277,7 +277,7 @@ separately under `params`, grouped by parameters file.
dependencies and outputs, to allow safely running multiple DVC commands in
parallel

## Structure of cache directory
## Structure of the cache directory

There are two ways in which the data is stored in <abbr>cache</abbr>: As a
single file (eg. `data.csv`), or a directory of files.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/dvcignore.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Only the hash values of a directory (`data/`) and one file have been
(`data1`).

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.
Now, let's modify file `data1` and see if it affects `dvc status`.
Expand Down
14 changes: 9 additions & 5 deletions content/docs/user-guide/setup-google-drive-remote.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,13 @@ to establish GDrive remote connections (e.g. CI/CD).
## Quick start

To start using a Google Drive remote, you only need to add it with a
[valid URL format](#url-format). Then use any DVC command that needs it (e.g.
`dvc pull`, `dvc fetch`, `dvc push`). For example:
[valid URL format](#url-format). Then use any DVC command that needs to connect
to it (e.g. `dvc pull` or `dvc push` once there's tracked data to synchronize).
For example:

```dvc
$ dvc add data
...
$ dvc remote add --default myremote \
gdrive://0AIac4JZqHhKmUk9PDA/dvcstore
$ dvc push
Expand Down Expand Up @@ -192,9 +195,10 @@ authentication is needed.
## Authorization

On the first usage of a GDrive [remote](/doc/command-reference/remote), for
example when trying to `dvc push` for the first time after adding the remote
with a [valid URL](#url-format), DVC will prompt you to visit a special Google
authentication web page. There you'll need to sign into your Google account. The
example when trying to `dvc push` tracked data for the first time, DVC will
prompt you to visit a special Google authentication web page. There you'll need
to sign into a Google account with the needed access to the GDrive
[URL](#url-format) in question. The
[auth process](https://developers.google.com/drive/api/v2/about-auth) will ask
you to grant DVC the necessary permissions, and produce a verification code
needed for DVC to complete the connection. On success, the necessary credentials
Expand Down

0 comments on commit 138acdf

Please sign in to comment.