Skip to content

Commit

Permalink
Merge pull request #1504 from sarthakforwet/new_dvc
Browse files Browse the repository at this point in the history
term: remove Dvcfile from repro cmd ref.
  • Loading branch information
jorgeorpinel authored Jul 5, 2020
2 parents ee515b8 + 64897ea commit b00ef75
Showing 1 changed file with 22 additions and 35 deletions.
57 changes: 22 additions & 35 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ usage: dvc repro [-h] [-q | -v] [-f] [-s] [-c <path>] [-m] [--dry] [-i]
[--no-commit] [--downstream] [targets [targets ...]]
positional arguments:
targets Stage or .dvc file to reproduce. 'Dvcfile' by default.
targets Stage or .dvc file to reproduce
```

## Description
Expand All @@ -40,9 +40,6 @@ There's a few ways to restrict the stages that will be regenerated by this
command: by specifying stage file `targets`, or by using the `--single-item`,
`--cwd`, or other options.

If specific [DVC-files](/doc/user-guide/dvc-files-and-directories) (`targets`)
are omitted, `Dvcfile` will be assumed.

`dvc repro` does not run `dvc fetch`, `dvc pull` or `dvc checkout` to get data
files, intermediate or final results.

Expand Down Expand Up @@ -101,8 +98,7 @@ only execute the final stage.
(non-recursively) if multiple stage files are given as `targets`.

- `-c <path>`, `--cwd <path>` - directory within the project to reproduce from.
If no `targets` are given, it attempts to use `Dvcfile` in the specified
directory. Instead of using `--cwd`, one can alternately specify a target in a
Instead of using `--cwd`, one can alternately specify a target in a
subdirectory as `path/to/target.dvc`. This option can be useful for example
with subdirectories containing a separate pipeline that can either be
reproduced as part of the pipeline in the parent directory, or as an
Expand Down Expand Up @@ -169,7 +165,7 @@ only execute the final stage.
## Examples

For simplicity, let's build a pipeline defined below. (If you want get your
hands-on something more real, see this shot
hands-on something more real, see this short
[pipeline tutorial](/doc/tutorials/pipelines)). It takes this `text.txt` file:

```
Expand All @@ -184,18 +180,13 @@ best
And runs a few simple transformations to filter and count numbers:

```dvc
$ dvc run -f filter.dvc -d text.txt -o numbers.txt \
$ dvc run -n filter -d text.txt -o numbers.txt \
"cat text.txt | egrep '[0-9]+' > numbers.txt"
$ dvc run -f Dvcfile -d numbers.txt -d process.py -M count.txt \
$ dvc run -n count -d numbers.txt -d process.py -M count.txt \
"python process.py numbers.txt > count.txt"
```

> Note that using `-f Dvcfile` with `dvc run` above is optional, the stage file
> name would otherwise default to `count.txt.dvc`. We use `Dvcfile` in this
> example because that's the default stage file name `dvc repro` will read
> without having to provide any `targets`.
Where `process.py` is a script that, for simplicity, just prints the number of
lines:

Expand All @@ -213,23 +204,23 @@ The result of executing these `dvc run` commands should look like this:
```dvc
$ tree
.
├── Dvcfile <---- second stage with a default DVC name
├── count.txt <---- result: "2"
├── filter.dvc <---- first stage
├── dvc.lock <---- file to record pipeline state
├── dvc.yaml <---- file containing list of stages.
├── numbers.txt <---- intermediate result of the first stage
├── process.py <---- code that implements data transformation
└── text.txt <---- text file to process
```

You may want to check the contents of `Dvcfile` and `count.txt` for later
You may want to check the contents of `dvc.lock` and `count.txt` for later
reference.

Ok, now, let's run the `dvc repro` command (remember, by default it reproduces
<abbr>outputs</abbr> tracked in `Dvcfile`, in this case `count.txt`):
Ok, now, let's run the `dvc repro` command:

```dvc
$ dvc repro
WARNING: assuming default target 'Dvcfile'.
Stage 'filter' didn't change, skipping
Stage 'count' didn't change, skipping
Data and pipelines are up to date.
```

Expand All @@ -247,17 +238,14 @@ If we now run `dvc repro`, we should see this:

```dvc
$ dvc repro
WARNING: assuming default target 'Dvcfile'.
Stage 'Dvcfile' changed.
Reproducing 'Dvcfile'
Running command:
python process.py numbers.txt > count.txt
Output 'count.txt' doesn't use cache. Skipping saving.
Saving information to 'Dvcfile'.
Stage 'filter' didn't change, skipping
Running stage 'count' with command:
python3 process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

You can now check that `Dvcfile` and `count.txt` have been updated with the new
information and updated dependency/output file hash values, and a new result,
You can now check that `dvc.lock` and `count.txt` have been updated with the new
information: updated dependency/output file hash values, and a new result,
respectively.

## Example: Downstream
Expand All @@ -277,14 +265,13 @@ Now, using the `--downstream` option results in the following output:

```dvc
$ dvc repro --downstream
WARNING: assuming default target 'Dvcfile'.
Data and pipelines are up to date.
```

The reason being that the `text.txt` file is a dependency in the target
[DVC-file](/doc/user-guide/dvc-files-and-directories) (`Dvcfile` by default).
This `Dvcfile` stage is dependent on `filter.dvc`, which happens first in this
pipeline (shown in the following figure):
The reason being that the `text.txt` file is a dependency in the last stage of
the pipeline (used by default by `dvc repro`), This last `count` stage is
dependent on `filter` stage, which happens first in this pipeline (shown in the
following figure):

```dvc
$ dvc dag
Expand All @@ -296,6 +283,6 @@ $ dvc dag
*
*
.---------.
| Dvcfile |
| count |
`---------'
```

0 comments on commit b00ef75

Please sign in to comment.