Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term : remove "Dvcfile" from versioning use case tutorial #1526

Merged
merged 9 commits into from
Jul 6, 2020
30 changes: 14 additions & 16 deletions content/docs/use-cases/versioning-data-and-model-files/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,27 +317,25 @@ When you have a script that takes some data as an input and produces other data
> ```

```dvc
$ dvc run -f Dvcfile \
-d train.py -d data \
-M metrics.csv \
-o model.h5 -o bottleneck_features_train.npy -o bottleneck_features_validation.npy \
$ dvc run -n train -d train.py -d data \
-o model.h5 -o bottleneck_features_train.npy \
-o bottleneck_features_validation.npy -M metrics.csv \
python train.py
```

Similar to `dvc add`, `dvc run` creates a
[DVC-file](/doc/user-guide/dvc-files-and-directories) named `Dvcfile` (specified
using the `-f` option). It tracks all outputs (`-o`) the same way as `dvc add`
does. Unlike `dvc add`, `dvc run` also tracks dependencies (`-d`) and the
command (`python train.py`) that was run to produce the result. We call such a
DVC-file a "stage file".
`dvc run` writes a pipeline stage named `train` (specified using the `-n`
option) in [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file).
It tracks all outputs (`-o`) the same way as `dvc add` does. Unlike
`dvc add`, `dvc run` also tracks dependencies (`-d`) and the command
(`python train.py`) that was run to produce the result.

> At this point you could run `git add .` and `git commit` to save the `Dvcfile`
> stage file and its changed outputs to the repository.
> At this point you could run `git add .` and `git commit` to save the `train`
> stage and its outputs to the repository.

`dvc repro` will run `Dvcfile` if any of its dependencies (`-d`) changed. For
example, when we added new images to built the second version of our model, that
was a dependency change. It also updates outputs and puts them into the
<abbr>cache</abbr>.
`dvc repro` will run the `train` stage if any of its dependencies (`-d`)
changed. For example, when we added new images to built the second version of
our model, that was a dependency change. It also updates outputs and puts them
into the <abbr>cache</abbr>.

To make things a little simpler: if `dvc add` and `dvc checkout` provide a basic
mechanism to version control large data files or models, `dvc run` and
Expand Down