-
Notifications
You must be signed in to change notification settings - Fork 394
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #800 from iterative/jorgeorpinel
Improve re-importing example in `dvc import`
- Loading branch information
Showing
7 changed files
with
82 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ DVC provides an easy way to reuse datasets, intermediate results, ML models, or | |
other files and directories tracked in another <abbr>DVC repository</abbr> into | ||
the workspace. The `dvc import` command downloads such a <abbr>data | ||
artifact</abbr> in a way that it is tracked with DVC, so it can be updated when | ||
the data source changes. | ||
the data source changes. (See `dvc update`.) | ||
|
||
The `url` argument specifies the address of the Git repository containing the | ||
source <abbr>project</abbr>. Both HTTP and SSH protocols are supported for | ||
|
@@ -92,10 +92,10 @@ repository</abbr>, such as our | |
[get started example repo](https://github.com/iterative/example-get-started). | ||
|
||
```dvc | ||
$ dvc import [email protected]:iterative/example-get-started data/data.xml | ||
Importing 'data/data.xml ([email protected]:iterative/example-get-started)' -> 'data.xml' | ||
... | ||
Saving information to 'data.xml.dvc'. | ||
$ dvc import [email protected]:iterative/example-get-started \ | ||
data/data.xml | ||
Importing 'data/data.xml ([email protected]:iterative/example-get-started)' | ||
-> 'data.xml' | ||
``` | ||
|
||
In contrast with `dvc get`, this command doesn't just download the data file, | ||
|
@@ -121,14 +121,27 @@ outs: | |
``` | ||
Several of the values above are pulled from the original stage file | ||
`model.pkl.dvc` in the external DVC repository. `url` and `rev_lock` fields are | ||
used to specify the origin and version of the dependency. | ||
`model.pkl.dvc` in the external DVC repository. The `url` and `rev_lock` | ||
subfields under `repo` are used to save the origin and version of the | ||
dependency. | ||
|
||
## Example: fixed revisions & re-importing | ||
|
||
When the `--rev` option is used, the import stage | ||
([DVC-file](/doc/user-guide/dvc-file-format)) will include a `rev` field under | ||
`repo` like this: | ||
To import a specific revision of a <abbr>data artifact</abbr>, we may use the | ||
`--rev` option: | ||
|
||
```dvc | ||
$ dvc import --rev cats-dogs-v1 \ | ||
[email protected]:iterative/dataset-registry.git \ | ||
use-cases/cats-dogs | ||
Importing | ||
'use-cases/cats-dogs ([email protected]:iterative/dataset-registry.git)' | ||
-> 'cats-dogs' | ||
``` | ||
|
||
When using this option, the import stage | ||
([DVC-file](/doc/user-guide/dvc-file-format)) will also have a `rev` subfield | ||
under `repo`: | ||
|
||
```yaml | ||
deps: | ||
|
@@ -139,22 +152,25 @@ deps: | |
rev_lock: 0547f5883fb18e523e35578e2f0d19648c8f2d5c | ||
``` | ||
|
||
If the Git revision moves, such as a branch, this doesn't have much of an effect | ||
on the import/update workflow. However, for static refs such as tags (unless | ||
manually updated), or for SHA commits, `dvc update` will not have any effect on | ||
the import. In this cases, in order to actually "update" an import, it's | ||
necessary to **re-import the data** instead, by using `dvc import` again without | ||
or with a different `--rev`. For example: | ||
If the | ||
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References) | ||
moves (e.g. a branch), you may use `dvc update` to bring the data up to date. | ||
However, for typically static references (e.g. tags), or for SHA commits, in | ||
order to actually "update" an import, it's necessary to **re-import the data** | ||
instead, by using `dvc import` again without or with a different `--rev`. This | ||
will overwrite the import stage (DVC-file), either removing or replacing the | ||
`rev` field, respectively. This can produce an import stage that is able to be | ||
updated normally with `dvc update` going forward. For example: | ||
|
||
```dvc | ||
$ dvc import --rev master \ | ||
[email protected]:iterative/dataset-registry.git \ | ||
use-cases/cats-dogs | ||
``` | ||
|
||
This will overwrite the import stage (DVC-file) either removing or replacing the | ||
`rev` field. This can produce an import stage that is able to be updated | ||
normally with `dvc update` going forward. | ||
> In the above example, the value for `rev` in the new import stage will be | ||
> `master`, which happens to be the default branch in this Git repository, so | ||
> the command is equivalent to not using `--rev` at all. | ||
|
||
## Example: Data registry | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,16 +25,17 @@ DVC-file `targets` as command arguments. | |
|
||
Note that import stages are considered always "locked", meaning that if you run | ||
`dvc repro`, they won't be updated. `dvc update` is the only command that can | ||
update them. Also, for `dvc import` import stages, the `rev_lock` field is | ||
updated by `dvc update`. | ||
update them. | ||
|
||
Another detail to note is that when the `--rev` (revision) option of | ||
`dvc import` has been used to create an import stage, DVC is not aware of what | ||
kind of | ||
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References) this | ||
is, for example a branch or a tag. For static refs such as tags (unless manually | ||
updated), or for SHA commits, `dvc update` will not have any effect on the | ||
import. | ||
is, for example a branch or a tag. For typically static references (e.g. tags), | ||
or for SHA commits, `dvc update` will not have any effect on the import. Refer | ||
to the | ||
[re-importing example](/doc/command-reference/import#example-fixed-revisions-re-importing) | ||
to learn how to "update" fixed-revision imports. | ||
|
||
## Options | ||
|
||
|
@@ -52,10 +53,8 @@ Let's first import a data artifact from our | |
|
||
```dvc | ||
$ dvc import [email protected]:iterative/example-get-started model.pkl | ||
Importing 'model.pkl ([email protected]:iterative/example-get-started)' -> 'model.pkl' | ||
... | ||
Saving information to 'model.pkl.dvc'. | ||
... | ||
Importing 'model.pkl ([email protected]:iterative/example-get-started)' | ||
-> 'model.pkl' | ||
``` | ||
|
||
As DVC mentions, the import stage (DVC-file) `model.pkl.dvc` is created. This | ||
|
@@ -73,4 +72,6 @@ Saving information to 'model.pkl.dvc'. | |
This time nothing has changed, since the source <abbr>project</abbr> is rather | ||
stable. | ||
|
||
> Refer to this [re-importing example]() for | ||
> Note that `dvc update` updates the `rev_lock` field of the corresponding | ||
> [DVC-file](/doc/user-guide/dvc-file-format) (when there are changes to bring | ||
> in). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -108,7 +108,7 @@ $ dvc run -d remote://example/data.txt \ | |
``` | ||
|
||
Please refer to `dvc remote add` for more details like setting up access | ||
credentials for certain remotes. | ||
credentials for the different remotes. | ||
|
||
## Example: import-url command | ||
|
||
|
@@ -119,7 +119,6 @@ external path or URL types. | |
```dvc | ||
$ dvc import-url https://data.dvc.org/get-started/data.xml | ||
Importing 'https://data.dvc.org/get-started/data.xml' -> 'data.xml' | ||
... | ||
``` | ||
|
||
The command above creates the <abbr>import stage</abbr> (DVC-file) | ||
|
@@ -144,22 +143,21 @@ outs: | |
DVC checks the headers returned by the server, looking for a strong | ||
[ETag](https://en.wikipedia.org/wiki/HTTP_ETag) or a | ||
[Content-MD5](https://tools.ietf.org/html/rfc1864) header, and uses it to know | ||
if the file has changed and we need to download it again. | ||
[Content-MD5](https://tools.ietf.org/html/rfc1864) header, and uses it to | ||
determine whether the source has changed and we need to download the file again. | ||
</details> | ||
## Example: Using import | ||
`dvc import` can download a <abbr>data artifact</abbr> from an external | ||
<abbr>DVC repository</abbr>repository. It also creates an external dependency in | ||
its <abbr>import stage</abbr> (DVC-file). | ||
`dvc import` can download a <abbr>data artifact</abbr> from any <abbr>DVC | ||
repository</abbr>. It also creates an external dependency in its <abbr>import | ||
stage</abbr> (DVC-file). | ||
|
||
```dvc | ||
$ dvc import [email protected]:iterative/example-get-started model.pkl | ||
Importing 'model.pkl ([email protected]:iterative/example-get-started)' -> 'model.pkl' | ||
Preparing to download data from 'https://remote.dvc.org/get-started' | ||
... | ||
Importing 'model.pkl ([email protected]:iterative/example-get-started)' | ||
-> 'model.pkl' | ||
``` | ||
|
||
The command above creates `model.pkl.dvc`, where the external dependency is | ||
|
@@ -184,7 +182,7 @@ outs: | |
persist: false | ||
``` | ||
|
||
For external sources that are <abbr>DVC repositories</abbr>, `url` and | ||
`rev_lock` fields are used to specify the origin and version of the dependency. | ||
The `url` and `rev_lock` subfields under `repo` are used to save the origin and | ||
version of the dependency. | ||
|
||
</details> |