-
Notifications
You must be signed in to change notification settings - Fork 394
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* cmd ref: add note that move creates dirs * cmd ref: improve structure of add ref desc. * grammar: add some commas * term: checksum -> hash value in dvcignore guide * style: lower case bullet text * cmd ref: remove some redundancy in metrics index * cmd ref: update plots refs synopsis and descriptions per iterative/dvc/issues/3924 et al. * Add plots modify cmd * typo: CSV->csv * term: working tree -> workspace per iterative/dvc/pull/3914 * cmd ref: couple improvements to add ref per #1382 (review) and #1382 (review) * Update config/prismjs/dvc-commands.js * cmd ref: update plots modify description * cmd ref: add plots modify to nav, with a few more improvements * cmd ref: plots --show-json -> --show-vega per iterative/dvc#3891 (comment) * rename x-lab to x-label * cmd ref: review descriptions of plots index, show, and diff * cmd ref: review and update old plots cmds options per iterative/dvc#3948 et al. * cmd ref: a couple more option updates per #1382 (review) * cmd ref: emphasize add works with any large file/dir per #1382 (review) * cmd ref: updae plots modify top half (definition, description) per #1382 (review) al. * cmd ref: improve all plot cmd option descriptions * Update content/docs/command-reference/plots/modify.md * cmd ref: review examples (mainly images) in plots modify per #1382 (comment) et al. * cmd ref: rephrase info about how data arrays are injected to plot templates per #1382 (review) * cmd ref: update info on how targets for for plots show/diff per #1382 (review) * cmd ref: double check all plots examples per #1382 (comment) * cmd ref: remove info about plots show --select * cmd ref: update add desc per #1382 (review) * cmd ref: re-explain dvc add for dirs per #1382 (review) * cmd ref: improve description about targets in plots diff per #1382 (review) * cmd ref: make emoji note in plots index per #1382 (review) * cmd ref: remove ineffective CSV code block highlighting from plots refs per #1382 (review) * get started: improve intro in index * glossary: remove external deps entry (no need) * cmd ref: update add for 1.0 (1) up to... before Examples * cmd ref: 1.0 updates for add (2) - examples * cmd ref: remove note about comments in add example per #1411 (review) Co-authored-by: Dmitry Petrov <[email protected]>
- Loading branch information
1 parent
37f4e90
commit 64182e2
Showing
4 changed files
with
66 additions
and
73 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,11 +6,11 @@ Track data files or directories with DVC, by creating a corresponding | |
## Synopsis | ||
|
||
```usage | ||
usage: dvc add [-h] [-q | -v] [-R] [--no-commit] [-f <filename>] | ||
targets [targets ...] | ||
usage: dvc add [-h] [-q | -v] [-R] [--no-commit] [--external] | ||
[-f <filename>] targets [targets ...] | ||
positional arguments: | ||
targets Input files/directories to add. | ||
targets Files or directories to add | ||
``` | ||
|
||
## Description | ||
|
@@ -36,29 +36,30 @@ Under the hood, a few actions are taken for each file (or directory) in | |
|
||
1. Calculate the file hash. | ||
2. Move the file contents to the cache (by default in `.dvc/cache`), using the | ||
file hash to form the cached file names. (See | ||
file hash to form the cached file path. (See | ||
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory) | ||
for more details.) | ||
3. Attempt to replace the file with a link to the cached data (more details | ||
further down). | ||
4. Create a corresponding `.dvc` file to store the file (as an | ||
<abbr>output</abbr>), using its path and hash to identify the cached data. | ||
Unless the `-f` option is used, the `.dvc` file name generated by default is | ||
`<file>.dvc`, where `<file>` is the file name of the first target. | ||
5. Unless `dvc init --no-scm` was used when initializing the project, add the | ||
`targets` to `.gitignore` in order to prevent them from being committed to | ||
the Git repository. | ||
3. Attempt to replace the file with a link to the cached data (more details on | ||
file linking further down). | ||
4. Create a corresponding [`.dvc` file](/doc/user-guide/dvc-file-format) to | ||
track the file, using its path and hash to identify the cached data. The | ||
`.dvc` file lists the DVC-tracked file as an <abbr>output</abbr> (`outs` | ||
field). Unless the `-f` option is used, the `.dvc` file name generated by | ||
default is `<file>.dvc`, where `<file>` is the file name of the first target. | ||
5. Add the `targets` to `.gitignore` in order to prevent them from being | ||
committed to the Git repository (unless `dvc init --no-scm` was used when | ||
initializing the DVC project). | ||
6. Instructions are printed showing `git` commands for adding the files, if | ||
appropriate. | ||
|
||
Summarizing, the result is that the target data is replaced by small `.dvc` | ||
files that can be tracked with Git. See | ||
files that can easily be tracked with Git. See | ||
[DVC-File Format](/doc/user-guide/dvc-file-format) for more details. | ||
|
||
> Note that `.dvc` files created by this command are considered _orphan stage | ||
> files_ because they have no _dependencies_, only outputs. These are always | ||
> treated as _changed_ by `dvc repro`, which always executes them. See `dvc run` | ||
> to learn more about stage files. | ||
> Note that `.dvc` files can be considered _orphan stages_, because they have no | ||
> <abbr>dependencies</abbr>, only outputs. These are treated as _always changed_ | ||
> by `dvc status` and `dvc repro`, which always executes them. See | ||
> [`dvc.yaml`](/doc/user-guide/dvc-file-format) to learn more about stages. | ||
To avoid adding files inside a directory accidentally, you can add the | ||
corresponding [patterns](/doc/user-guide/dvcignore) in a `.dvcignore` file. | ||
|
@@ -111,6 +112,9 @@ undesirable for data directories with a large number of files. | |
file name of the given target. This option allows to set the name and the path | ||
of the generated `.dvc` file. | ||
|
||
- `--external` - allow `targets` that are outside of the DVC repository. See | ||
[Managing External Data](/doc/user-guide/managing-external-data). | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no | ||
|
@@ -124,15 +128,14 @@ Track a file with DVC: | |
|
||
```dvc | ||
$ dvc add data.xml | ||
... | ||
Saving information to 'data.xml.dvc'. | ||
To track the changes with git run: | ||
To track the changes with git, run: | ||
git add .gitignore data.xml.dvc | ||
git add .gitignore data.xml.dvc | ||
``` | ||
|
||
As shown above, a [`.dvc` file](/doc/user-guide/dvc-file-format) has been | ||
As indicated above, a [`.dvc` file](/doc/user-guide/dvc-file-format) has been | ||
created for `data.xml`. Let's explore the result: | ||
|
||
```dvc | ||
|
@@ -145,32 +148,21 @@ $ tree | |
Let's check the `data.xml.dvc` file inside: | ||
|
||
```yaml | ||
md5: aae37d74224b05178153acd94e15956b | ||
outs: | ||
- cache: true | ||
md5: d8acabbfd4ee51c95da5d7628c7ef74b | ||
metric: false | ||
- md5: 6137cde4893c59f76f005a8123d8e8e6 | ||
path: data.xml | ||
meta: # Special field to contain arbitary user data | ||
name: John | ||
email: [email protected] | ||
``` | ||
This is a standard `.dvc` file with only one output (in the `outs` field). The | ||
hash value should correspond to a file path in the <abbr>cache</abbr>. | ||
|
||
> Note that the `meta` values above were entered manually for this example. Meta | ||
> values and `#` comments are not preserved when a `.dvc` file is overwritten | ||
> with the `dvc add`, `dvc run`, `dvc import`, or `dvc import-url` commands. | ||
This is a standard `.dvc` file with only one output (`outs` field). The hash | ||
value (`md5` field) corresponds to a file path in the <abbr>cache</abbr>. | ||
|
||
```dvc | ||
$ file .dvc/cache/d8/acabbfd4ee51c95da5d7628c7ef74b | ||
.dvc/cache/d8/acabbfd4ee51c95da5d7628c7ef74b: ASCII text | ||
.dvc/cache/61/37cde4893c59f76f005a8123d8e8e6: ASCII text | ||
``` | ||
|
||
Note that tracking compressed files (e.g. ZIP or TAR archives) is not | ||
recommended, as `dvc add` supports tracking directories. (Details below.) | ||
⚠️ Note that tracking compressed files (e.g. ZIP or TAR archives) is not | ||
recommended, as `dvc add` supports tracking directories (details below). | ||
|
||
## Example: Directory | ||
|
||
|
@@ -193,63 +185,64 @@ Tracking a directory with DVC as simple as with a single file: | |
|
||
```dvc | ||
$ dvc add pics | ||
Computing md5 for a large number of files. This is only done once. | ||
... | ||
Linking directory 'pics'. | ||
Saving information to 'pics.dvc'. | ||
... | ||
``` | ||
|
||
There are no [`.dvc` files](/doc/user-guide/dvc-file-format) generated within | ||
this directory structure, but the images are all added to the | ||
<abbr>cache</abbr>. DVC prints a message mentioning that MD5 hash values are | ||
computed for each file. A single `pics.dvc` file is generated for the top-level | ||
this directory structure to match each images, but the image files are all | ||
<abbr>cached</abbr>. A single `pics.dvc` file is generated for the top-level | ||
directory, and it contains: | ||
|
||
```yaml | ||
md5: df06d8d51e6483ed5a74d3979f8fe42e | ||
outs: | ||
- cache: true | ||
md5: b8f4d5a78e55e88906d5f4aeaf43802e.dir | ||
metric: false | ||
- md5: ce57450aa92ab8f2b957c24b0df73edc.dir | ||
path: pics | ||
wdir: . | ||
``` | ||
|
||
> Refer to | ||
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory) | ||
> for more info. | ||
|
||
This allows us to treat the entire directory structure as a single <abbr>data | ||
artifact</abbr>. This lets you pass the whole directory tree as a | ||
artifact</abbr>. For example, you can pass the whole directory tree as a | ||
<abbr>dependency</abbr> to a `dvc run` stage definition: | ||
|
||
```dvc | ||
$ dvc run -f train.dvc \ | ||
$ dvc run -n train \ | ||
-d train.py -d pics \ | ||
-M metrics.json -o model.h5 \ | ||
python train.py | ||
``` | ||
|
||
> To follow the full example, see the [Versioning](/doc/tutorials/versioning) | ||
> tutorial. | ||
> To try this example, see the [Versioning](/doc/tutorials/versioning) tutorial. | ||
|
||
If instead we use the `--recursive` (`-R`) option, the output looks like this: | ||
|
||
```dvc | ||
$ dvc add -R pics | ||
Saving information to 'pics/cat1.jpg.dvc'. | ||
Saving information to 'pics/cat3.jpg.dvc'. | ||
Saving information to 'pics/cat2.jpg.dvc'. | ||
Saving information to 'pics/cat4.jpg.dvc'. | ||
... | ||
``` | ||
|
||
In this case, a `.dvc` file is generated for each file in the `pics/` directory | ||
tree. No top-level `.dvc` file is generated, which is typically less convenient. | ||
For example, we cannot use the directory structure as one unit with `dvc run` or | ||
other commands. | ||
tree: | ||
|
||
```dvc | ||
$ tree pics | ||
pics | ||
├── train | ||
| ├── cats | ||
| | ├── img1.jpg | ||
| | ├── img1.jpg.dvc | ||
| | ├── img2.jpg | ||
| | ├── img2.jpg.dvc | ||
| | ├── ... | ||
| └── dogs | ||
| ├── img1.jpg | ||
| ├── img1.jpg.dvc | ||
| ... | ||
``` | ||
|
||
Note that no top-level `.dvc` file is generated, which is typically less | ||
convenient. For example, we cannot use the directory structure as one unit with | ||
`dvc run` or other commands. | ||
|
||
## Example: Dvcignore | ||
|
||
|
@@ -290,6 +283,7 @@ $ tree .dvc/cache | |
└── 4bcc8502a70ac49bf441db350eafc2 | ||
``` | ||
|
||
Only the hash values of directory (`dir/`) and `file2` have been cached. | ||
Only the hash values of the `dir/` directory (with `.dir` file extension) and | ||
`file2` have been cached. | ||
|
||
See [Dvcignore](/doc/user-guide/dvcignore) for more details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters