Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro 1.x : Updated downstream example and added info for Sequential execution. #1624

Merged
merged 18 commits into from
Jul 31, 2020
Merged
Changes from 11 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
34217b8
cmd: rewrite Downstream example and added info for sequential executi…
sarthakforwet Jul 24, 2020
a071b7d
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 24, 2020
edff33e
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 24, 2020
71a5088
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 24, 2020
163ed19
cmd: Updated Downstream example
sarthakforwet Jul 25, 2020
cf873a4
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 27, 2020
ca04fb0
repro: Updated Downstream example
sarthakforwet Jul 28, 2020
dded2d7
Merge branch 'repro_misc' of github.com:sarthakforwet/dvc.org into re…
sarthakforwet Jul 28, 2020
30ce7bb
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 29, 2020
66e0603
cmd: updated last para for the description of --downstream and improv…
sarthakforwet Jul 29, 2020
8597f53
repro.md: updated Downstream example
sarthakforwet Jul 30, 2020
70b7d2a
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
73499f2
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
e40402c
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
1696951
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
bfe6800
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
e83bc5d
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
012b72f
Update content/docs/command-reference/repro.md
jorgeorpinel Jul 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 38 additions & 28 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@ other options.

It saves all the data files, intermediate or final results into the <abbr>DVC
cache</abbr> (unless the `--no-commit` option is used), and updates the hash
values of changed dependencies and outputs in the DVC files (`dvc.lock` and
`.dvc`).
values of changed dependencies and outputs in the `dvc.lock` and `.dvc` files.

### Parallel stage execution

Expand Down Expand Up @@ -83,11 +82,12 @@ $ dvc dag
```

This pipeline consists of two parallel branches (`A` and `B`), and the final
"result" stage, where the branches merge. To reproduce both branches at the same
time, you could run `dvc repro A2` and `dvc repro B2` at the same time (e.g. in
separate terminals). After both finish successfully, you can then run
`dvc repro train`: DVC will know that both branches are already up-to-date and
only execute the final stage.
`train` stage, where the branches merge. If you run `dvc repro` at this point,
it would reproduce each branch sequentially before `train`. To reproduce both
branches simultaneously, you could run `dvc repro A2` and `dvc repro B2` at the
same time (e.g. in separate terminals). After both finish successfully, you can
then run `dvc repro train`: DVC will know that both branches are already
up-to-date and only execute the final stage.

## Options

Expand Down Expand Up @@ -151,7 +151,8 @@ only execute the final stage.
each execution, meaning the cache cannot be trusted for such stages.

- `--downstream` - only execute the stages after the given `targets` in their
corresponding pipelines, including the target stages themselves.
corresponding pipelines, including the target stages themselves. This option
has no effect if `targets` are not provided.

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down Expand Up @@ -239,7 +240,7 @@ If we now run `dvc repro`, we should see this:
$ dvc repro
Stage 'filter' didn't change, skipping
Running stage 'count' with command:
python3 process.py numbers.txt > count.txt
python process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

Expand All @@ -262,31 +263,40 @@ The answer to universe is 42
- The Hitchhiker's Guide to the Galaxy
```

Now, using the `--downstream` option results in the following output:
Let's say we want to print the filename also in the description and so we update
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
the `process.py` as:

```python
print(f'Number of lines in {sys.argv[1]}:')
print(num_lines)
```

Now, using the `--downstream` option with `dvc repro`, results in the execution
of stages after the target stage (`count` in this case) in the pipeline.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc repro --downstream
Data and pipelines are up to date.
$ dvc repro --downstream count
Running stage 'count' with command:
python process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

The reason being that the `text.txt` file is not a dependency in the last stage
of the pipeline, used as the default target by `dvc repro`. `text.txt` is a
dependency of the `filter` stage, which happens earlier (shown in the figure
below), so it's skipped given the `--downstream` option.
The change in `text.txt` is ignored because that file is a dependency in the
`filter` stage, which did not get updated in the above command. This is because
`filter` happens before `count` in the pipeline (shown below).
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc dag
.------------.
| filter |
`------------'
*
*
*
.---------.
| count |
`---------'

+--------+
| filter |
+--------+
*
*
*
+-------+
| count |
+-------+
```
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

> Note that using `dvc repro --downstream` without a target will always have a
> similar effect, where all previous stages are ignored — only if the last stage
> is changed will it have any effect.
> Refer to `dvc dag` for more details on that command.