Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: repro: add --pull #1841

Merged
merged 2 commits into from
Oct 6, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ analyzing dependencies and <abbr>outputs</abbr> of the target stages.
```usage
usage: dvc repro [-h] [-q | -v] [-f] [-s] [-c <path>] [-m] [--dry] [-i]
[-p] [-P] [-R] [--no-run-cache] [--force-downstream]
[--no-commit] [--downstream] [targets [targets ...]]
[--no-commit] [--downstream] [--pull]
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[targets [targets ...]]

positional arguments:
targets Stage or .dvc file to reproduce
Expand Down Expand Up @@ -154,6 +155,9 @@ up-to-date and only execute the final stage.
corresponding pipelines, including the target stages themselves. This option
has no effect if `targets` are not provided.

- `--pull` - try automatically [pulling](/doc/command-reference/pull) missing
cache for outputs restored from run-cache.
Comment on lines +158 to +159
Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back on this. Per iterative/dvc#4538 (comment):

dvc repro --pull pulls regular files, hashes for which might've been restored from the existing run-cache, so kinda like regular dvc pull

Unfortunately I don't understand either one of the explanations. What's the relationship between run-cache and repro --pull? Maybe a step-by-step explanation like 1. Use repro --pull; 2. run-cache is checked before executing commands (default repro behavior I think); 3. Some output hashes are found? (but not the actual files? This is the confusing part); 4. Hashes are looked for in the cache but not found; 5. The files are looked for in remote storage. Something like that

Please @efiop ! Thanks in advance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel Even if we leave the run-cache out, repro --pull would still try to dvc pull outputs that are missing, but the pipeline didn't change. E.g. when you forgot to dvc pull beforehand and you are trying to dvc repro otherwise up-to-date pipeline, so dvc repro --pull will just pull the outputs for such stages instead of trying to reproduce them.

Run-cache is then just a special source of lock files, and repro --pull works the same way as explained above.

Want to point out again that --pull is still a temporary solution that was needed to improve pull --run-cache that is also not complete in a product sense. So I would recommend not spending much time on this, as the product scenario is WIP and there is no reason to optimize the docs for it too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it makes more sense now, thanks.

In this case I do feel like need to spend enough time understanding what's going on so that when the coming bulk of docs related to new features hit, I'm better prepared. So thanks again for baring with me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last Q @efiop. Does this only check the default remote (if one is set)? Or all remotes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, 2 more questions...

  • Does it check only the local run-cache? Or also the remote run-cache for possible dep/out hashes?
  • What happens if you do repro --pull --no-run-cache? Is the run-cache check skipped?

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only check the default remote (if one is set)? Or all remotes?

Only the default remote right now.

Does it check only the local run-cache? Or also the remote run-cache for possible dep/out hashes?

Yes, only local run-cache.

What happens if you do repro --pull --no-run-cache? Is the run-cache check skipped?

Correct. It will only pull if you have your lock file complete (so hashes are already there, just the outputs are missing from cache), but won't try to use run-cache.

Please feel free to ask any questions, I do understand that this incomplete feature is a bit confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if all
Expand Down