Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro --pull implies --allow-missing #10434

Closed
wants to merge 1 commit into from
Closed

Conversation

dberenbaum
Copy link
Collaborator

See #10412 (comment). When we introduced --allow-missing, we also updated the behavior of --pull, but kept them as separate flags for a couple reasons (see discussion here):

  • didn't want to change behavior of --pull too much
  • worried about too much complexity in one flag
  • wanted to wait and see how it was used

Since then, every user I have encountered expects the behavior to be what --pull --allow-missing does. I don't see any valid case where someone using --pull would not want --allow-missing (we are basically already implying this by treating missing data as needing to be pulled).

@dberenbaum dberenbaum requested a review from skshetry May 20, 2024 14:49
"Try automatically pulling missing cache for outputs restored "
"from the run-cache."
"Try automatically pulling missing dependencies and outputs. "
"Implies --allow-missing."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it better to deprecate then allow-missing? and here describe the behavior in a more explicit way (by not referring to another option, but actually saying what is happening)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to deprecate now because:

  1. That is a clear breaking change that I don't think we should do unless we want to do a major version change
  2. Unlike --pull, --allow-missing on its own can be useful. It was introduced to solve New command: dvc verify - check that the pipeline is up to date without having to pull or run it #5369, in which case the user doesn't want to pull anything.

Open to other ideas on how to explain it but worried about overcomplicating it. I think this is how people expect --pull works now. With --pull and --allow-missing, if you have run the stage before and pushed it, it won't get run again. Without --allow-missing, the stage may still be considered changed and run again even if it can be pulled from the run-cache.

@dberenbaum
Copy link
Collaborator Author

This raised a couple other issues when I started looking deeper:

  1. With this PR, --pull may skip unchanged stages without pulling them since --allow-missing is intended to skip these stages entirely. This saves time pulling that data but also may be unexpected/more breaking than if we pull unstaged changes.
  2. Regardless of this PR, --allow-missing will skip pull and checkout for unchanged stages, but it will still checkout unchanged stages from the local cache. This may also be expensive, and arguably we should skip checkout for these.

I'm working on an alternate PR that would address these questions.

@dberenbaum
Copy link
Collaborator Author

This raised a couple other issues when I started looking deeper:

1. With this PR, `--pull` may skip unchanged stages without pulling them since `--allow-missing` is intended to skip these stages entirely. This saves time pulling that data but also may be unexpected/more breaking than if we pull unstaged changes.

2. Regardless of this PR, `--allow-missing` will skip pull and checkout for unchanged stages, but it will still checkout unchanged stages from the local cache. This may also be expensive, and arguably we should skip checkout for these.

I'm working on an alternate PR that would address these questions.

@dberenbaum dberenbaum closed this May 21, 2024
@skshetry skshetry deleted the pull-allow-missing branch August 20, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants