Skip to content

Commit

Permalink
cmd ref: consistency updates on rev, rev_lock fields...
Browse files Browse the repository at this point in the history
and related concepts throughout the docs
  • Loading branch information
jorgeorpinel committed Nov 19, 2019
1 parent df3781d commit 89026ce
Show file tree
Hide file tree
Showing 5 changed files with 38 additions and 29 deletions.
7 changes: 3 additions & 4 deletions static/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,9 @@ outs:
```
Several of the values above are pulled from the original stage file
`model.pkl.dvc` in the external DVC repository. `url` and `rev_lock` subfields
under `repo` are used to save the origin and version of the dependency.
`model.pkl.dvc` in the external DVC repository. The `url` and `rev_lock`
subfields under `repo` are used to save the origin and version of the
dependency.

## Example: fixed revisions & re-importing

Expand Down Expand Up @@ -154,8 +155,6 @@ deps:
If the
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
moves (e.g. branches), you may use `dvc update` to bring the data up to date.
This will update `rev_lock` in the import stage (DVC-file).

However, for typically static references (e.g. tags), or for SHA commits, in
order to actually "update" an import, it's necessary to **re-import the data**
instead, by using `dvc import` again without or with a different `--rev`. This
Expand Down
13 changes: 7 additions & 6 deletions static/docs/command-reference/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ DVC-file `targets` as command arguments.

Note that import stages are considered always "locked", meaning that if you run
`dvc repro`, they won't be updated. `dvc update` is the only command that can
update them. Also, for `dvc import` import stages, the `rev_lock` field is
updated by `dvc update`.
update them.

Another detail to note is that when the `--rev` (revision) option of
`dvc import` has been used to create an import stage, DVC is not aware of what
Expand Down Expand Up @@ -54,10 +53,8 @@ Let's first import a data artifact from our

```dvc
$ dvc import [email protected]:iterative/example-get-started model.pkl
Importing 'model.pkl ([email protected]:iterative/example-get-started)' -> 'model.pkl'
...
Saving information to 'model.pkl.dvc'.
...
Importing 'model.pkl ([email protected]:iterative/example-get-started)'
-> 'model.pkl'
```

As DVC mentions, the import stage (DVC-file) `model.pkl.dvc` is created. This
Expand All @@ -74,3 +71,7 @@ Saving information to 'model.pkl.dvc'.

This time nothing has changed, since the source <abbr>project</abbr> is rather
stable.

> Note that `dvc update` updates the `rev_lock` field of the corresponding
> [DVC-file](/doc/user-guide/dvc-file-format) (when there are changes to bring
> in).
22 changes: 13 additions & 9 deletions static/docs/get-started/import-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
We've seen how to [push](/doc/get-started/store-data) and
[pull](/doc/get-started/retrieve-data) data from/to a <abbr>DVC project</abbr>'s
[remote](/doc/command-reference/remote). But what if we wanted to integrate a
dataset or ML model produced in one project into another project?
dataset or ML model produced in one project into another one?

One way is to download the data (with `wget` or `dvc get`, for example) and use
`dvc add` to track it, but the connection between projects would be lost. We
wouldn't be able to tell where the data came from or whether there are new
versions available. A better alternative is the `dvc import` command:
One way is to manually download the data (with `wget` or `dvc get`, for example)
and use `dvc add` to track it, but the connection between the projects would be
lost. We wouldn't be able to tell where the data came from or whether there are
new versions available. A better alternative is the `dvc import` command:

<!--
In the [Add Files](/doc/get-started/add-files) chapter, for example, we download
Expand All @@ -31,7 +31,7 @@ This downloads `data.xml` from our
[dataset-registry](https://github.com/iterative/dataset-registry) project into
the current working directory, adds it to `.gitignore`, and creates the
`data.xml.dvc` [DVC-file](/doc/user-guide/dvc-file-format) to track changes in
the source data. With _imports_, we can use `dvc update` to check for changes in
the source data. With _imports_, we can use `dvc update` to bring in changes in
the external data source before [reproducing](/doc/get-started/reproduce) any
<abbr>pipeline</abbr> that depends on this data.

Expand Down Expand Up @@ -67,9 +67,11 @@ outs:
persist: false
```
The `url` subfield under `repo` points to the source project, while `rev_lock`
lets DVC know which Git repository version the data came from. Note that
`dvc update`, when successful, updates the `rev_lock` value.
The `url` and `rev_lock` subfields under `repo` are used to save the origin and
version of the dependency.

> Note that `dvc update` updates the `rev_lock` field of the corresponding
> DVC-file (when there are changes to bring in).

</details>

Expand All @@ -80,3 +82,5 @@ to normal with:
$ git reset --hard
$ rm -f data.*
```

> See also `dvc import-url`.
9 changes: 7 additions & 2 deletions static/docs/user-guide/dvc-file-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,16 @@ A dependency entry consists of a pair of fields:
- `etag`: Strong ETag response header (only HTTP <abbr>external
dependencies</abbr> created with `dvc import-url`)
- `repo`: This entry is only for external dependencies created with
`dvc import`, and in itself contains the following fields:
`dvc import`, and can contains the following fields:

- `url`: URL of Git repository with source DVC project
- `rev`: Only present when the `--rev` option of `dvc import` is used.
Specific
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
used to import the dependency from.
- `rev_lock`: Revision or version (Git commit hash) of the external <abbr>DVC
repository</abbr> at the time of importing the dependency
repository</abbr> at the time of importing or updating (with `dvc update`)
the dependency.

> See the examples in
> [External Dependencies](/doc/user-guide/external-dependencies) for more
Expand Down
16 changes: 8 additions & 8 deletions static/docs/user-guide/external-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ $ dvc run -d remote://example/data.txt \
```

Please refer to `dvc remote add` for more details like setting up access
credentials for certain remotes.
credentials for the different remotes.

## Example: import-url command

Expand Down Expand Up @@ -144,16 +144,16 @@ outs:
DVC checks the headers returned by the server, looking for a strong
[ETag](https://en.wikipedia.org/wiki/HTTP_ETag) or a
[Content-MD5](https://tools.ietf.org/html/rfc1864) header, and uses it to know
if the file has changed and we need to download it again.
[Content-MD5](https://tools.ietf.org/html/rfc1864) header, and uses it to
determine whether the source has changed and we need to download the file again.
</details>
## Example: Using import
`dvc import` can download a <abbr>data artifact</abbr> from an external
<abbr>DVC repository</abbr>repository. It also creates an external dependency in
its <abbr>import stage</abbr> (DVC-file).
`dvc import` can download a <abbr>data artifact</abbr> from any <abbr>DVC
repository</abbr>. It also creates an external dependency in its <abbr>import
stage</abbr> (DVC-file).

```dvc
$ dvc import [email protected]:iterative/example-get-started model.pkl
Expand Down Expand Up @@ -184,7 +184,7 @@ outs:
persist: false
```

For external sources that are <abbr>DVC repositories</abbr>, `url` and
`rev_lock` fields are used to specify the origin and version of the dependency.
The `url` and `rev_lock` subfields under `repo` are used to save the origin and
version of the dependency.

</details>

0 comments on commit 89026ce

Please sign in to comment.