Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: misc updates #1662

Merged
merged 62 commits into from
Aug 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
d04f082
cmd: review repro examples
jorgeorpinel Jul 22, 2020
a201abf
cmd: fic tyupo in get-url
jorgeorpinel Jul 22, 2020
f7d1a7e
Merge branch 'master' into jorge
jorgeorpinel Jul 23, 2020
fe8687e
cmd: updates to repro
jorgeorpinel Jul 23, 2020
f3df9d5
cmd: rewrite repro -P desc
jorgeorpinel Jul 23, 2020
ebc1560
cmd: simplified and generalize repro targets desc and DVC file mention
jorgeorpinel Jul 23, 2020
9a8c977
cmd: minor update for repro desc wording
jorgeorpinel Jul 23, 2020
c38ce31
term: don't use "synchronize" in the context of checkout
jorgeorpinel Jul 24, 2020
34217b8
cmd: rewrite Downstream example and added info for sequential executi…
sarthakforwet Jul 24, 2020
a071b7d
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 24, 2020
edff33e
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 24, 2020
71a5088
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 24, 2020
163ed19
cmd: Updated Downstream example
sarthakforwet Jul 25, 2020
cf873a4
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 27, 2020
ca04fb0
repro: Updated Downstream example
sarthakforwet Jul 28, 2020
dded2d7
Merge branch 'repro_misc' of github.com:sarthakforwet/dvc.org into re…
sarthakforwet Jul 28, 2020
30ce7bb
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 29, 2020
66e0603
cmd: updated last para for the description of --downstream and improv…
sarthakforwet Jul 29, 2020
05f0157
Merge branch 'master' into jorge
jorgeorpinel Jul 30, 2020
e532010
cmd: review language of init --subdir
jorgeorpinel Jul 30, 2020
b0fe9c1
term: revuew usage of "granular", esp. around init --subdir
jorgeorpinel Jul 30, 2020
8597f53
repro.md: updated Downstream example
sarthakforwet Jul 30, 2020
8ca9134
cmd: improve init --subdir explanation
jorgeorpinel Jul 30, 2020
8134010
cmd: add info about nested subrepos to init
jorgeorpinel Jul 30, 2020
a5db93f
cmd: fix -P option desc.
jorgeorpinel Jul 31, 2020
40512a1
cmd: improve explanation on how --subdir affects commands
jorgeorpinel Jul 31, 2020
1f77e8f
cmd: simplify nested structures explanation in init
jorgeorpinel Jul 31, 2020
d4bd8f8
Merge branch 'master' into jorge
jorgeorpinel Jul 31, 2020
42b670f
guide: add note aboud `cp` not being a download in external deps
jorgeorpinel Jul 31, 2020
d30bc63
cmd: add note about what --cwd means to repro
jorgeorpinel Jul 31, 2020
0ea4bd3
guide: nvmd! removing that note in external deps
jorgeorpinel Jul 31, 2020
70b7d2a
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
73499f2
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
e40402c
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
1696951
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
bfe6800
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
3a20f21
cmd: more small updates to init
jorgeorpinel Jul 31, 2020
e83bc5d
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
012b72f
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
80d2575
Restyled by prettier
restyled-commits Jul 31, 2020
2d06b22
Merge pull request #1648 from iterative/restyled/pull-1624
jorgeorpinel Jul 31, 2020
a85d8a0
cmd: rewrap metrics diff usage paragraph
jorgeorpinel Aug 3, 2020
38dfee7
Merge branch 'jorge' of github.com:iterative/dvc.org into jorge
jorgeorpinel Aug 3, 2020
fd2a9bb
term: remove "just" from -j desc in 3 refs
jorgeorpinel Aug 3, 2020
8d5adf7
Merge branch 'master' into jorge
jorgeorpinel Aug 5, 2020
221ed75
cmd: add command examples to init --subdir use cases
jorgeorpinel Aug 5, 2020
d14e960
cmd: explain nested repo and projects of all kinds outside of --subdir
jorgeorpinel Aug 5, 2020
5b47bd4
cmd: remove bold names to nested and not-nested structure examples in…
jorgeorpinel Aug 5, 2020
80a0f09
cmd: standardize --jobs option in all refs
jorgeorpinel Aug 5, 2020
5078044
cmd: add speed note to --jobs desc in all refs.
jorgeorpinel Aug 5, 2020
f176872
cmd: change versioning command example in init
jorgeorpinel Aug 5, 2020
cac92e5
cd: change repo comments in init --subdir examples
jorgeorpinel Aug 5, 2020
d1f9d24
cmd: improve note on DVC submodules a little
jorgeorpinel Aug 5, 2020
8d854a1
cmd: better explain why isolation is important in --subdir bullet
jorgeorpinel Aug 5, 2020
be5bc6d
cmd: split last --subdir cases explicitly as 2 bullets
jorgeorpinel Aug 5, 2020
9c3ba77
cmd: remove most notes and code block examples about nesting projects…
jorgeorpinel Aug 5, 2020
67e4248
Merge branch 'master' into jorge
jorgeorpinel Aug 5, 2020
cde3cd2
guide: clarify that tracked data is often needed for
jorgeorpinel Aug 5, 2020
d5d16e4
cmd: add/improve note on tracking directories with add and run
jorgeorpinel Aug 5, 2020
7546c1f
guide: update title of #structure-of-the-cache-directory section of
jorgeorpinel Aug 5, 2020
cb80e02
cmd: mention external deps/outs as a bullet in run notes
jorgeorpinel Aug 5, 2020
c02e81e
cmd: improve note about tracking dirs as outputs in run
jorgeorpinel Aug 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/api-reference/get_url.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ URL returned depends on the
`remote` used (see the [Parameters](#parameters) section).

If the target is a directory, the returned URL will end in `.dir`. Refer to
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
and `dvc add` to learn more about how DVC handles data directories.

⚠️ This function does not check for the actual existence of the file or
Expand Down
19 changes: 11 additions & 8 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ each one:
1. Calculate the file hash.
2. Move the file contents to the cache (by default in `.dvc/cache`), using the
file hash to form the cached file path. (See
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.)
3. Attempt to replace the file with a link to the cached data (more details on
file linking further down).
Expand Down Expand Up @@ -71,7 +71,7 @@ large files. DVC also supports other link types for use on file systems without
`reflink` support, but they have to be specified manually. Refer to the
`cache.type` config option in `dvc config cache` for more information.

### Tracking directories
### Adding entire directories

A `dvc add` target can be an individual file or a directory. In the latter case,
a `.dvc` file is created for the top of the directory (with default name
Expand All @@ -83,9 +83,13 @@ in the directory tree. Instead, the single `.dvc` file references a special JSON
file in the cache (with `.dir` extension), that in turn points to the added
files.

Note that DVC commands that use tracked files support granular targeting of
files, even when the directory is added as a whole. Examples: `dvc push`,
`dvc pull`, `dvc get`, `dvc import`, etc.
> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info. on `.dir` cache entries.

Note that DVC commands that use tracked data support granular targeting of files
and directories, even when contained in a parent directory added as a whole.
Examples: `dvc push`, `dvc pull`, `dvc get`, `dvc import`, etc.

As a rarely needed alternative, the `--recursive` option causes every file in
the hierarchy to be added individually. A corresponding `.dvc` file will be
Expand Down Expand Up @@ -192,9 +196,8 @@ outs:
path: pics
```

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> for more info.
> Refer to [Adding entire directories](#adding-entire-directories) for more
> info.

This allows us to treat the entire directory structure as a single <abbr>data
artifact</abbr>. For example, you can pass the whole directory tree as a
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/cache/dir.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ positional arguments:
## Description

Helper to set the `cache.dir` configuration option. (See
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).)
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).)
Unlike doing so with `dvc config cache`, this command transform paths (`value`)
that are provided relative to the current working directory into paths
**relative to the config file location**. However, if the `value` provided is an
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ remote. See `dvc remote` for more information.
A DVC project <abbr>cache</abbr> is the hidden storage (by default located in
the `.dvc/cache` directory) for files that are tracked by DVC, and their
different versions. (See `dvc cache` and
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.) This section contains the following options:

- `cache.dir` - set/unset cache directory location. A correct value must be
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ $ tree .dvc
Note that the `.dvc/cache` directory was created and populated.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.

Used without arguments (as above), `dvc fetch` downloads all assets needed by
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/gc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ of commits (determined by reading the DVC-files in them). See the
[Options](#options) section for more details.

> Note that `dvc gc` tries to fetch any missing
> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> from [remote storage](/doc/command-reference/remote) to the local
> <abbr>cache</abbr>, in order to know which files should exist inside cached
> directories. These files may be missing if the cache directory was previously
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Finally, we used `dvc status` to double check that all data had been uploaded.
## Example: What happens in the cache?

Let's take a detailed look at what happens to the
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
as you run an experiment locally and push data to remote storage. To set the
example consider having created a <abbr>workspace</abbr> that contains some code
and data, and having set up a remote.
Expand Down Expand Up @@ -242,7 +242,7 @@ the cache having more files in it than the remote – which is what the `new`
state means.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.

Next we can copy the remaining data from the cache to the remote using
Expand Down
35 changes: 25 additions & 10 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,23 +73,38 @@ $ dvc run -n printer -d write.sh -o pages ./write.sh
$ dvc run -n scanner -d read.sh -d pages -o signed.pdf ./read.sh
```

Stage dependencies can be any file or directory, either untracked, or more
commonly tracked by DVC or Git. Outputs will be tracked and <abbr>cached</abbr>
by DVC when the stage is run. Every output version will be cached when the stage
is reproduced (see also `dvc gc`).

Relevant notes:

- Typically, scripts being run (or a directory containing the source code) are
included among the specified `-d` dependencies. This ensures that when the
source code changes, DVC knows that the stage needs to be reproduced. (You can
chose whether to do this.)
- Typically, scripts being run (or possibly a directory containing the source
code) are included among the specified `-d` dependencies. This ensures that
when the source code changes, DVC knows that the stage needs to be reproduced.
(You can chose whether to do this.)

- `dvc run` checks the dependency graph integrity before creating a new stage.
For example: two stage cannot explicitly specify the same output, there should
be no cycles, etc.
For example: two stage cannot specify the same output or overlapping output
paths, there should be no cycles, etc.

- DVC does not feed dependency files to the command being run. The program will
have to read by itself the files specified with `-d`.

- Outputs are deleted from the <abbr>workspace</abbr> before executing the
command (including at `dvc repro`), so it should be able to recreate any
directories marked as outputs.
- Entire directories produced by the stage can be tracked as outputs by DVC,
which generates a single `.dir` entry in the cache (refer to
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more info.)

- [external dependencies](/doc/user-guide/external-dependencies) and
[external outputs](/doc/user-guide/managing-external-data) (outside of the
<abbr>workspace</abbr>) are also supported.

- Outputs are deleted from the workspace before executing the command (including
at `dvc repro`) if their paths are found as existing files/directories. This
also means that the stage command needs to recreate any directory structures
defined as outputs every time its executed by DVC.

### For displaying and comparing data science experiments

Expand Down Expand Up @@ -117,7 +132,7 @@ systems and require certain software packages to be installed.

Wrap the command with double quotes `"` if there are special characters in it
like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
`dvc run` as a whole. Use single quotes `'` instead if there are environment
`dvc run` itself. Use single quotes `'` instead if there are environment
variables in it that should be evaluated dynamically. Examples:

```dvc
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/basic-concepts/dvc-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ match: ['DVC cache', cache, caches, cached]
The DVC cache is a hidden storage (by default located in the `.dvc/cache`
directory) for files that are under DVC control, and their different versions.
For more details, please refer to this
[document](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).
[document](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).
4 changes: 2 additions & 2 deletions content/docs/user-guide/dvc-files-and-directories.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ separately under `params`, grouped by parameters file.
hand or with the command `dvc config --local`.

- `.dvc/cache`: The <abbr>cache</abbr> directory will store your data in a
special [structure](#structure-of-cache-directory). The data files and
special [structure](#structure-of-the-cache-directory). The data files and
directories in the <abbr>workspace</abbr> will only contain links to the data
files in the cache. (Refer to
[Large Dataset Optimization](/doc/user-guide/large-dataset-optimization). See
Expand Down Expand Up @@ -277,7 +277,7 @@ separately under `params`, grouped by parameters file.
dependencies and outputs, to allow safely running multiple DVC commands in
parallel

## Structure of cache directory
## Structure of the cache directory

There are two ways in which the data is stored in <abbr>cache</abbr>: As a
single file (eg. `data.csv`), or a directory of files.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/dvcignore.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Only the hash values of a directory (`data/`) and one file have been
(`data1`).

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.

Now, let's modify file `data1` and see if it affects `dvc status`.
Expand Down
14 changes: 9 additions & 5 deletions content/docs/user-guide/setup-google-drive-remote.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,13 @@ to establish GDrive remote connections (e.g. CI/CD).
## Quick start

To start using a Google Drive remote, you only need to add it with a
[valid URL format](#url-format). Then use any DVC command that needs it (e.g.
`dvc pull`, `dvc fetch`, `dvc push`). For example:
[valid URL format](#url-format). Then use any DVC command that needs to connect
to it (e.g. `dvc pull` or `dvc push` once there's tracked data to synchronize).
For example:

```dvc
$ dvc add data
...
$ dvc remote add --default myremote \
gdrive://0AIac4JZqHhKmUk9PDA/dvcstore
$ dvc push
Expand Down Expand Up @@ -192,9 +195,10 @@ authentication is needed.
## Authorization

On the first usage of a GDrive [remote](/doc/command-reference/remote), for
example when trying to `dvc push` for the first time after adding the remote
with a [valid URL](#url-format), DVC will prompt you to visit a special Google
authentication web page. There you'll need to sign into your Google account. The
example when trying to `dvc push` tracked data for the first time, DVC will
prompt you to visit a special Google authentication web page. There you'll need
to sign into a Google account with the needed access to the GDrive
[URL](#url-format) in question. The
[auth process](https://developers.google.com/drive/api/v2/about-auth) will ask
you to grant DVC the necessary permissions, and produce a verification code
needed for DVC to complete the connection. On success, the necessary credentials
Expand Down