From 6fd51046abacacb085c7b0b41e21be5c90476420 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 14 Sep 2019 17:04:24 -0400 Subject: [PATCH] get: add example about checking out different artifact versions per https://github.com/iterative/dvc.org/pull/611#discussion_r322400667 but for #487 --- static/docs/commands-reference/diff.md | 2 +- static/docs/commands-reference/get.md | 75 +++++++++++++++++++++++--- 2 files changed, 70 insertions(+), 7 deletions(-) diff --git a/static/docs/commands-reference/diff.md b/static/docs/commands-reference/diff.md index 6ca0e30015..fa130427b6 100644 --- a/static/docs/commands-reference/diff.md +++ b/static/docs/commands-reference/diff.md @@ -169,7 +169,7 @@ diff for 'data/features' 0 files deleted, size was increased by 2.9 MB ``` -## Examples: Confirming that a target has not changed +## Example: Confirming that a target has not changed Let's use our example repo once again, which has several [available tags](https://github.com/iterative/example-get-started/tags) for diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index 8658bb98a5..5192ddbef8 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -53,15 +53,16 @@ created in the current working directory, with its original file name. - `-v`, `--verbose` - displays detailed tracing information. -## Examples +## Example: Machine learning model deployment -> Note that `dvc get` can be used form anywhere in the file system, as long as +> Note that `dvc get` can be used from anywhere in the file system, as long as > DVC is [installed](/doc/get-started/install). We can use `dvc get` to download the resulting model file from our [get started example repo](https://github.com/iterative/example-get-started), -which is a DVC project external to the current working directory). The desired -file is located in the root of the external repo, and named `model.pkl`. +which is a DVC repository external to the current working directory). The +desired file is tracked in the root of the external project, and +named `model.pkl`. ```dvc $ dvc get https://github.com/iterative/example-get-started model.pkl @@ -72,8 +73,8 @@ model.pkl ``` Note that the `model.pkl` file doesn't actually exist in the -[data directory](https://github.com/iterative/example-get-started/tree/master/) -of the external Git repo. Instead, the corresponding DVC-file +[root directory](https://github.com/iterative/example-get-started/tree/master/) +of the external Git repository. Instead, the corresponding DVC-file [train.dvc](https://github.com/iterative/example-get-started/blob/master/train.dvc) is found, which specifies `model.pkl` in its outputs (`outs`). DVC then [pulls](/doc/commands-reference/pull) the file from the default @@ -91,3 +92,65 @@ can be automated leveraging DVC with The same example applies to raw or intermediate data files as well, of course, for cases where we want to download those files and perform some analysis on them. + +## Example: Compare different versions of the same experiment + +`dvc get` has the `--rev` option, to specify which version of the repository to +download a data artifact from. It also has the `--out` option to +specify the file or directory path and file name for the download. Combining +these two options allows us to do something we can't achieve with the regular +`git checkout` + `dvc checkout` process – see for example the +[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get +Started_ section. + +Let's use the +[get started example repo](https://github.com/iterative/example-get-started) +again, like in the previous example. But this time, clone it first to see +`dvc get` in action inside a DVC project. + +```dvc +$ git clone git@github.com:iterative/example-get-started.git +$ cd example-get-started +``` + +If you are familiar with our [Get Started](/doc/get-started) example, you may +know that each chapter has a corresponding +[tag](https://github.com/iterative/example-get-started/tags). Tag `7-train` is +where we train a first version of the example model, and tag `9-bigrams-model` +has an improved model (trained using bigrams). What if we wanted to have both +versions of the model "checked out" at the same time? `dvc get` provides an easy +way to do this: + +```dvc +$ dvc get . model.pkl --rev 7-train --out model.monograms.pkl +``` + +> Notice that the `url` provided to `dvc get` above is `.`. `dvc get` accepts +> file system paths as a "URL" to the repository to get the data from for edge +> cases. + +The `model.monograms.pkl` file now contains the older version of the model. To +get the most recent one, we use a similar command, but with + +`-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev` +(since it's the latest version anyway). In fact in this case using `dvc pull` +should suffice, downloading the file as just `model.pkl`, which we can then +rename to make it extra obvious: + +```dvc +$ dvc pull train.dvc +$ mv model.pkl model.bigrams.pkl +``` + +And that's it! Now we have both model files in the workspace, with +different names, and not currently tracked by Git: + +```dvc +$ git status +... +Untracked files: + (use "git add ..." to include in what will be committed) + + model.bigrams.pkl + model.monograms.pkl +```