Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: document target granularity for push/pull/etc, et al. #1384

Merged
merged 71 commits into from
Aug 6, 2020
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
08e2a25
docs: document target granularity for push/pull/etc
efiop May 31, 2020
619bdd9
Merge branch 'master' into fix-886
jorgeorpinel Jul 9, 2020
eba2cf5
cmd: move parenthesis
jorgeorpinel Jul 9, 2020
e24e2bb
cmd: prep status for granularity info
jorgeorpinel Jul 9, 2020
76e94b1
cmd: remove _changed checksum_ status
jorgeorpinel Jul 9, 2020
3398775
Merge branch 'master' into fix-886
jorgeorpinel Jul 10, 2020
1bf9af5
cmd: reinstate _changed checksum_ status (but only for .dvc files)
jorgeorpinel Jul 10, 2020
94a86c5
cmd: explain granularity for status, and other 1.0 updates
jorgeorpinel Jul 10, 2020
489a6c1
cmd: add info about granularity to push and pull
jorgeorpinel Jul 10, 2020
169db68
cmd: update target arg desc in several commands
jorgeorpinel Jul 11, 2020
f866f1e
cmd: simplify status target behavior desc.
jorgeorpinel Jul 11, 2020
06fe612
cmd: remove granularity note from status desc, but leave example note
jorgeorpinel Jul 11, 2020
a0292ef
cmd: add granularity example to status
jorgeorpinel Jul 11, 2020
02340f1
cmd: granularity examples for status and fetch
jorgeorpinel Jul 12, 2020
c2d7b5b
cmd: fix example in status
jorgeorpinel Jul 12, 2020
b01ccd9
cmd: fix formatting in a few cmds
jorgeorpinel Jul 12, 2020
44e3396
cmd: fixes to checkout desc
jorgeorpinel Jul 13, 2020
e81060a
cmd: improve fetch ref per coming changes to checkout...
jorgeorpinel Jul 13, 2020
bd3f657
cmd: granularity example for checkout
jorgeorpinel Jul 13, 2020
ae6f85e
cmd: note which commands support granularity in file/dir targets
jorgeorpinel Jul 13, 2020
bcd2c88
cmd: link granularity notes to add directory example and
jorgeorpinel Jul 13, 2020
bcbfe68
cmd: add notes about granular path support for (list) get and import
jorgeorpinel Jul 13, 2020
12bb0c4
cmd: roll back note about orphan stages
jorgeorpinel Jul 13, 2020
cb518f9
cmd: roll back note about usefulness of checkout
jorgeorpinel Jul 13, 2020
5e29b0d
cmd: improve first step bullet in checkout
jorgeorpinel Jul 13, 2020
46404b3
cmd: remove note on orphan stages from add
jorgeorpinel Jul 13, 2020
e30ad99
cmd: fix characters in fetch
jorgeorpinel Jul 14, 2020
9327d61
cmd: small fix to push/pull
jorgeorpinel Jul 14, 2020
d67d0ca
cmd: make granularity note part of path desc in list/get/import
jorgeorpinel Jul 14, 2020
29f7465
cmd: update target granularity notes in add and checkout
jorgeorpinel Jul 14, 2020
d9b7f53
cmd: update example title and note on granularity
jorgeorpinel Jul 14, 2020
b241bf5
cmd: shorten remaining notes on granularity
jorgeorpinel Jul 14, 2020
7502b84
cmd: update specific target example titles
jorgeorpinel Jul 14, 2020
f3f6425
cmd: improve targets explanation in fetch
jorgeorpinel Jul 14, 2020
81db6e1
cmd: simplify granularity example note in checkout
jorgeorpinel Jul 14, 2020
9879c1d
cmd: further simplify notes about granularity
jorgeorpinel Jul 15, 2020
46caf71
cmd: fix typo in fetch
jorgeorpinel Jul 15, 2020
7f354d9
Merge branch 'master' into fix-886
jorgeorpinel Jul 15, 2020
f434350
Merge branch 'master' into fix-886
jorgeorpinel Jul 20, 2020
99a9789
cmd: update granularity note in add: Tracking directories + ex
jorgeorpinel Jul 20, 2020
a36ba41
cmd: update checkout ref to improve tagets and granularity explanations
jorgeorpinel Jul 20, 2020
80074ed
cmd: correct dvc.yaml -> dvc.lock in status, checkout, and fetch
jorgeorpinel Jul 20, 2020
73512e1
cmd: introduce p about fetch targets arg
jorgeorpinel Jul 20, 2020
cac98e3
Update content/docs/command-reference/status.md
jorgeorpinel Jul 20, 2020
b89fed8
cmd: intro status targets arg p and granularity note
jorgeorpinel Jul 20, 2020
b745aa6
cmd: update status and fetch example granularity note
jorgeorpinel Jul 20, 2020
f3c8dd4
Merge branch 'fix-886' of github.com:iterative/dvc.org into fix-886
jorgeorpinel Jul 20, 2020
8e7f63a
cmd: improve status dependency example and text
jorgeorpinel Jul 20, 2020
20927fe
cmd: apply unified notes about target granularity to push and pull
jorgeorpinel Jul 20, 2020
5cc92af
cmd: double check granularity notes in list/import/get are unified
jorgeorpinel Jul 20, 2020
3ecc352
Merge branch 'master' into fix-886
jorgeorpinel Jul 22, 2020
ff3d7aa
cmd: update dvc.lock -> dvc.yaml in some cases
jorgeorpinel Jul 22, 2020
e87e397
cmd: rewrite checkout desc per private feedback
jorgeorpinel Jul 23, 2020
478c8a2
cmd: rewrite checkout desc
jorgeorpinel Jul 24, 2020
1e420b6
cmd: don't use term "synchronize" in checkout
jorgeorpinel Jul 24, 2020
9f1876f
cmd: a couple more updates to checkout
jorgeorpinel Jul 24, 2020
508cf97
cmd: mention checkout deals with several filel/dirs in example
jorgeorpinel Jul 24, 2020
9895f02
Merge branch 'master' into fix-886
jorgeorpinel Aug 2, 2020
a0c869b
cmd: fix fetch description and related passages in other refs
jorgeorpinel Aug 3, 2020
87faff3
cmd: update note about granularity in list, get, import
jorgeorpinel Aug 3, 2020
f280bdf
cmd: update status example title
jorgeorpinel Aug 3, 2020
5370f1f
cmd: more feedback on fetch rewrite
jorgeorpinel Aug 4, 2020
41f4db7
cmd: simplify granularity note in all its docs
jorgeorpinel Aug 4, 2020
4cac81f
term: don't use "as a whole" phrase for tracked dirs
jorgeorpinel Aug 4, 2020
cabc8bf
cmd: make paragraph plural for consistency
jorgeorpinel Aug 4, 2020
6ac8b0f
cmd: update dvc.yaml vs lock file mention in status
jorgeorpinel Aug 4, 2020
8fd47ac
Merge branch 'master' into fix-886
jorgeorpinel Aug 5, 2020
b3eb463
cmd: edit notes about .dvcignore. General one in desc, specific one i…
jorgeorpinel Aug 5, 2020
a919f85
cmd: mention dvc.yaml in checkout desc
jorgeorpinel Aug 6, 2020
d6f9610
cmd: small corrections to fetch
jorgeorpinel Aug 6, 2020
acf0196
cmd: update explanation of stages and output hash values in status
jorgeorpinel Aug 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 4 additions & 9 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,6 @@ Summarizing, the result is that the target data is replaced by small
[`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files) that can be
easily tracked with Git.

> Note that `.dvc` files can be considered _orphan stages_, because they have no
> <abbr>dependencies</abbr>, only outputs. These are treated as _always changed_
> by `dvc status` and `dvc repro`, which always executes them. See
> [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) to learn
> more about stages.

To avoid adding files inside a directory accidentally, you can add the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
corresponding [patterns](/doc/user-guide/dvcignore) in a `.dvcignore` file.

Expand All @@ -87,9 +81,10 @@ in the directory tree. Instead, the single `.dvc` file references a special JSON
file in the cache (with `.dir` extension), that in turn points to the added
files.

Note that DVC commands that use tracked files support granular targeting of
files, even when the directory is added as a whole. Examples: `dvc push`,
`dvc pull`, `dvc get`, `dvc import`, etc.
Note that DVC commands that use tracked files or directories support targeting
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
them granularly, even inside a directory that is
[added as a whole](#example-directory). Examples: `dvc status`, `dvc push`,
`dvc pull`, among others.

As a rarely needed alternative, the `--recursive` option causes every file in
the hierarchy to be added individually. A corresponding `.dvc` file will be
Expand Down
104 changes: 60 additions & 44 deletions content/docs/command-reference/checkout.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ usage: dvc checkout [-h] [-q | -v] [--summary] [-d] [-R] [-f]
[--relink] [targets [targets ...]]

positional arguments:
targets Limit command scope to these stages or .dvc files.
Using -R, directories to search for stages or .dvc
files can also be given.
targets Limit command scope to these tracked files/directories,
.dvc files, or stage names.
```

## Description

`.dvc` and `dvc.lock` [files](/doc/user-guide/dvc-files-and-directories) act as
pointers to specific version of data files or directories tracked by DVC. This
command synchronizes the workspace data with the versions specified in the
current `.dvc` and `dvc.lock` files.
[`dvc.lock`](/doc/user-guide/dvc-files-and-directories#dvclock-file) and
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) files act as
pointers to the <abbr>cached</abbr> contents of data tracked by DVC. This
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
command synchronizes the workspace data with the tracked file contents specified
in the current `.dvc` and `dvc.lock` files.

`dvc checkout` is useful, for example, when using Git in the
<abbr>project</abbr>, after `git clone`, `git checkout`, or any other operation
Expand All @@ -33,15 +33,18 @@ for more details.

The execution of `dvc checkout` does the following:

- Scans the `.dvc` and `dvc.lock` files to compare against the data files or
directories in the <abbr>workspace</abbr>. DVC knows which data
(<abbr>outputs</abbr>) match because the corresponding hash values are saved
in the `outs` fields in those files. Scanning is limited to the given
- Scans all `dvc.lock` and `.dvc` files to compare the hash values of its
<abbr>outputs</abbr> against the actual data files or directories in the
workspace (similar to `dvc status`). Scanning is limited to the given
`targets` (if any). See also options `--with-deps` and `--recursive` below.

- Missing data files or directories are restored from the <abbr>cache</abbr>.
Those that don't match with any DVC-file are removed. See options `--force`
and `--relink`. A list of the changes done is printed.
- Missing data files or directories are restored from the cache. Those that
don't match with any DVC-file are removed. See options `--force` and
`--relink`. A list of the changes done is printed.

> Note that `dvc checkout` supports granular targeting of files inside
> directories that are
> [tracked as a whole](/doc/command-reference/add#example-directory).

By default, this command tries not make copies of cached files in the workspace,
using reflinks instead when supported by the file system (refer to
Expand Down Expand Up @@ -130,9 +133,8 @@ below.

The workspace looks like this:

````dvc
```dvc
.
├── README.md
├── data
│   └── data.xml.dvc
├── dvc.lock
Expand All @@ -141,15 +143,11 @@ The workspace looks like this:
├── prc.json
├── scores.json
└── src
├── evaluate.py
├── featurization.py
├── prepare.py
├── requirements.txt
└── train.py```
````
└── <code files here>
```

This repository includes the following tags, that represent different variants
of the resulting model:
Note that this repository includes the following tags, that represent different
variants of the resulting model:

```dvc
$ git tag
Expand All @@ -158,10 +156,11 @@ baseline-experiment <- First simple version of the model
bigrams-experiment <- Uses bigrams to improve the model
```

We can now just run `dvc checkout` that will update the most recent `model.pkl`,
`data.xml`, and other files that are tracked by DVC. The model file hash is
defined in the `dvc.lock` file, and in the `data.xml.dvc` file for the
`data.xml`:
We can now run `dvc checkout` to update the most recent `model.pkl`, `data.xml`,
and any other files tracked by DVC. The model file hash, `ab349c2...`, is saved
in the
[`dvc.lock` file](/doc/user-guide/dvc-files-and-directories#dvclock-file), for
example, so it can be confirmed with:

```dvc
$ dvc checkout
Expand All @@ -170,13 +169,15 @@ $ md5 model.pkl
MD5 (data.xml) = ab349c2b5fa2a0f66d6f33f94424aebe
```

## Example: Switch versions

What if we want to "rewind history", so to speak? The `git checkout` command
lets us restore any point in the repository history, including any tags. It
automatically adjusts the files, by replacing file content and adding or
deleting files as necessary.
lets us restore any commit in the repository history (including tags). It
automatically adjusts the repo files, by replacing, adding, or deleting them as
necessary.

```dvc
$ git checkout baseline-experiment # Stage where model is first created
$ git checkout baseline-experiment # Commit where model was created
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

Let's check the hash value of `model.pkl` in `dvc.lock` now:
Expand All @@ -187,16 +188,10 @@ outs:
md5: 98af33933679a75c2a51b953d3ab50aa
```

But if you check `model.pkl`, the file hash is still the same:

```dvc
$ md5 model.pkl
MD5 (model.pkl) = ab349c2b5fa2a0f66d6f33f94424aebe
```

This is because `git checkout` changed `dvc.lock` and other DVC files. But it
did nothing with the `model.pkl` and `matrix.pkl` files. Git doesn't track those
files; DVC does, so we must do this:
But if you check the MD5 of `model.pkl`, the file hash is still the same
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
(`ab349c2...`). This is because `git checkout` changed `dvc.lock` and other DVC
files, but it did nothing with `model.pkl`. Git doesn't track this file, DVC
does, so we must do this instead:

```dvc
$ dvc checkout
Expand All @@ -207,8 +202,29 @@ $ md5 model.pkl
MD5 (model.pkl) = 98af33933679a75c2a51b953d3ab50aa
```

What happened is that DVC went through the DVC-files and adjusted the current
set of <abbr>output</abbr> files to match the `outs` in them.
DVC went through `dvc.lock` and adjusted the current set of <abbr>outputs</abbr>
to match the `outs` in it.

## Example: Specific files or directories

`dvc checkout` only affects the tracked data corresponding to any given
`targets`:

```dvc
$ git checkout master
$ dvc checkout # Start with latest version of everything.

$ git checkout baseline-experiment -- dvc.lock
$ dvc checkout model.pkl # Get previous model file only.
```

Note that `dvc checkout` supports granular targeting of files inside directories
that are tracked as a whole. For example, the `featurize` stage has a directory
output (`data/features`) and we can do:

```dvc
$ dvc checkout data/features/test.pkl
```
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

## Example: Automating DVC checkout

Expand Down
Loading