Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term : remove "Dvcfile" from versioning use case tutorial #1526

Merged
merged 9 commits into from
Jul 6, 2020
27 changes: 14 additions & 13 deletions content/docs/use-cases/versioning-data-and-model-files/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,26 +317,27 @@ When you have a script that takes some data as an input and produces other data
> ```

```dvc
$ dvc run -f Dvcfile \
$ dvc run -n train \
-d train.py -d data \
-M metrics.csv \
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
-o model.h5 -o bottleneck_features_train.npy -o bottleneck_features_validation.npy \
-o model.h5 \
-o bottleneck_features_train.npy \
-o bottleneck_features_validation.npy \
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
python train.py
```

Similar to `dvc add`, `dvc run` creates a
[DVC-file](/doc/user-guide/dvc-files-and-directories) named `Dvcfile` (specified
using the `-f` option). It tracks all outputs (`-o`) the same way as `dvc add`
does. Unlike `dvc add`, `dvc run` also tracks dependencies (`-d`) and the
command (`python train.py`) that was run to produce the result. We call such a
DVC-file a "stage file".
`dvc run` creates a pipeline stage named `train` (specified using the `-n`
option) in [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file)
file. It tracks all outputs (`-o`) the same way as `dvc add` does. Unlike
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`dvc run` creates a pipeline stage named `train` (specified using the `-n`
option) in [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file)
file. It tracks all outputs (`-o`) the same way as `dvc add` does. Unlike
`dvc run` writes a pipeline stage named `train` (specified using the `-n`
option) in [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file).
It tracks all outputs (`-o`) the same way as `dvc add` does. Unlike

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run

Do we need to change it from here also?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it's out of scope. The whole ref. needs to be rewritten soon. I'm committing my suggestion above to save time. Please just fix the formatting. Resolving here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the command was already rewritten so yes, just this intro to that command should be rephrased... But still, out of scope for this PR. I'll send a separate one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`dvc add`, `dvc run` also tracks dependencies (`-d`) and the command
(`python train.py`) that was run to produce the result.

> At this point you could run `git add .` and `git commit` to save the `Dvcfile`
> stage file and its changed outputs to the repository.
> At this point you could run `git add .` and `git commit` to save the updated
> stage and its changed outputs to the repository.
sarthakforwet marked this conversation as resolved.
Show resolved Hide resolved

`dvc repro` will run `Dvcfile` if any of its dependencies (`-d`) changed. For
example, when we added new images to built the second version of our model, that
was a dependency change. It also updates outputs and puts them into the
`dvc repro` will run `train` stage if any of its dependencies (`-d`) changed.
sarthakforwet marked this conversation as resolved.
Show resolved Hide resolved
For example, when we added new images to built the second version of our model,
that was a dependency change. It also updates outputs and puts them into the
<abbr>cache</abbr>.

To make things a little simpler: if `dvc add` and `dvc checkout` provide a basic
Expand Down