-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkout: consistency in handling files that are missing version info #5913
Comments
Isn't this what |
@efiop Are you waiting on product decisions here? I think we can revisit the warnings, but seems like we can implement pull behavior to match push for now. |
I'm sorry, may I asked which version info is missing |
@karajan1001 This is the scenario as far as I remember:
EDIT: A much simpler summary is that no |
@dberenbaum We might consider starting to treat no-version-info checkout errors as nonfatal. We have some visualization of result in checkout right now (A, D etc), so maybe there is something we can do there instead of push-like warnings, need to take a look. @skshetry WDYT? ^ There is also a question of backward compatibility, seems like if we switch to nonfatal behaviour by-default, we might break this for guys that use this to check if they were able to successfully checkout/pull (their cases and behavior could be separate, it is just an implementation detail that pull uses checkout inside). But it does make sense to do that switch, since this is closer to being a bug. At the same time, looks like we might want to provide a flag (or alternative way) to check that version info is present for particular outs/stages/etc. |
I think that the incomplete checkout is a failure, be it due to missing cache or missing version info. |
Related #6039 |
@skshetry Should push also fail? In the scenario above, it seems surprising to me that push or pull would fail. Should any stage that hasn't been run yet cause a failure on push or pull? |
Ping to keep this discussion moving. Any thoughts? |
For the recently created stage, maybe we should not fail, but there could be other reasons why the version info might be missing:
|
Sure, there could be lots of reasons for not having version info. If there's no way to determine whether this is expected, isn't that what warnings are for?
So More importantly, how would a user know this, and how can we avoid confusion? This seems like a subtle and unexpected distinction if I'm running various commands and some fail and some don't despite doing encountering the same issues. |
@dberenbaum, I am not clear on how it should be handled. I understand
Right now, we delete the file if we are missing version info, even though we fail. |
Okay, that does seem more dangerous! Although if it's still happening on failure, then it doesn't seem like this behavior has much to do with whether the command fails or not. Also, why does dvc delete the file? |
Ping @skshetry |
I don't have a good idea on why it deletes the file on missing version info. DVC behaves like this in a lot of places, which I'd like to fix (eg: |
Just an old assumption so that you are not confused by |
Okay, rethinking this from the high-level perspective. There's inconsistency between push and pull, but I think I'm at least as bothered by why either command should throw an error here. In CML, they set up a pipeline but don't run that pipeline yet. Then they set up a job that pulls the data and runs the pipeline. DVC is trying to push/pull the outputs of the pipeline, which don't exist yet since the pipeline hasn't run. For both push and pull, the user expects that the outputs don't exist. There are lots of scenarios where users don't expect pipeline outputs to exist:
There are two possible approaches:
It seems like we have opted for the first option, but what do you think about the second approach? It seems worth considering since we are increasingly trying to separate data management and pipelines. |
related #4746 |
It would be great to have such an option to In my use case, I define pipeline stages in |
Closing this in favor of #4746, but feel free to reopen if there's any other aspects you want to keep discussing. |
dvc checkout
(anddvc pull
since it uses it internally) will error-out if version-info is missing, whiledvc push
will just warn us, which creates a confusing inconsistency. For example, it makes CML guys use|| true
fordvc pull
https://github.com/DavidGOrtega/cml-dvc-test/runs/2438634469 CC @DavidGOrtega , as they don't havedvc.lock
on initial run.Current
dvc push
behaviour with warnings is also quite annoying and doesn't scale well, as it might get way to noisy for a big pipeline. So we should probably agree on one common behaviour for all such cases that will make sense to our users. CC @dberenbaumThe text was updated successfully, but these errors were encountered: