-
Notifications
You must be signed in to change notification settings - Fork 394
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
get: add example about checking out different artifact versions
per #611 (comment) but for #487
- Loading branch information
1 parent
af424ca
commit 6fd5104
Showing
2 changed files
with
70 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,15 +53,16 @@ created in the current working directory, with its original file name. | |
|
||
- `-v`, `--verbose` - displays detailed tracing information. | ||
|
||
## Examples | ||
## Example: Machine learning model deployment | ||
|
||
> Note that `dvc get` can be used form anywhere in the file system, as long as | ||
> Note that `dvc get` can be used from anywhere in the file system, as long as | ||
> DVC is [installed](/doc/get-started/install). | ||
We can use `dvc get` to download the resulting model file from our | ||
[get started example repo](https://github.com/iterative/example-get-started), | ||
which is a DVC project external to the current working directory). The desired | ||
file is located in the root of the external repo, and named `model.pkl`. | ||
which is a DVC repository external to the current working directory). The | ||
desired file is tracked in the root of the external <abbr>project</abbr>, and | ||
named `model.pkl`. | ||
|
||
```dvc | ||
$ dvc get https://github.com/iterative/example-get-started model.pkl | ||
|
@@ -72,8 +73,8 @@ model.pkl | |
``` | ||
|
||
Note that the `model.pkl` file doesn't actually exist in the | ||
[data directory](https://github.com/iterative/example-get-started/tree/master/) | ||
of the external Git repo. Instead, the corresponding DVC-file | ||
[root directory](https://github.com/iterative/example-get-started/tree/master/) | ||
of the external Git repository. Instead, the corresponding DVC-file | ||
[train.dvc](https://github.com/iterative/example-get-started/blob/master/train.dvc) | ||
is found, which specifies `model.pkl` in its outputs (`outs`). DVC then | ||
[pulls](/doc/commands-reference/pull) the file from the default | ||
|
@@ -91,3 +92,65 @@ can be automated leveraging DVC with | |
The same example applies to raw or intermediate data files as well, of course, | ||
for cases where we want to download those files and perform some analysis on | ||
them. | ||
|
||
## Example: Compare different versions of the same experiment | ||
|
||
`dvc get` has the `--rev` option, to specify which version of the repository to | ||
download a <abbr>data artifact</abbr> from. It also has the `--out` option to | ||
specify the file or directory path and file name for the download. Combining | ||
these two options allows us to do something we can't achieve with the regular | ||
`git checkout` + `dvc checkout` process – see for example the | ||
[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get | ||
Started_ section. | ||
|
||
Let's use the | ||
[get started example repo](https://github.com/iterative/example-get-started) | ||
again, like in the previous example. But this time, clone it first to see | ||
`dvc get` in action inside a <abbr>DVC project</abbr>. | ||
|
||
```dvc | ||
$ git clone [email protected]:iterative/example-get-started.git | ||
$ cd example-get-started | ||
``` | ||
|
||
If you are familiar with our [Get Started](/doc/get-started) example, you may | ||
know that each chapter has a corresponding | ||
[tag](https://github.com/iterative/example-get-started/tags). Tag `7-train` is | ||
where we train a first version of the example model, and tag `9-bigrams-model` | ||
has an improved model (trained using bigrams). What if we wanted to have both | ||
versions of the model "checked out" at the same time? `dvc get` provides an easy | ||
way to do this: | ||
|
||
```dvc | ||
$ dvc get . model.pkl --rev 7-train --out model.monograms.pkl | ||
``` | ||
|
||
> Notice that the `url` provided to `dvc get` above is `.`. `dvc get` accepts | ||
> file system paths as a "URL" to the repository to get the data from for edge | ||
> cases. | ||
The `model.monograms.pkl` file now contains the older version of the model. To | ||
get the most recent one, we use a similar command, but with | ||
|
||
`-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev` | ||
(since it's the latest version anyway). In fact in this case using `dvc pull` | ||
should suffice, downloading the file as just `model.pkl`, which we can then | ||
rename to make it extra obvious: | ||
|
||
```dvc | ||
$ dvc pull train.dvc | ||
$ mv model.pkl model.bigrams.pkl | ||
``` | ||
|
||
And that's it! Now we have both model files in the <abbr>workspace</abbr>, with | ||
different names, and not currently tracked by Git: | ||
|
||
```dvc | ||
$ git status | ||
... | ||
Untracked files: | ||
(use "git add <file>..." to include in what will be committed) | ||
model.bigrams.pkl | ||
model.monograms.pkl | ||
``` |