-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvcfile: support remote per output #6486
Conversation
🔥 How does this work with other
|
remote = self.repo.cloud.get_remote_odb(self.remote) | ||
self.repo.cloud.pull([obj.hash_info], odb=remote, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need support for better error messages, like where the remote has been specified in case of NoRemoteError
or RemoteNotFound
.
@@ -185,6 +188,7 @@ def load_from_pipeline(stage, data, typ="outs"): | |||
Output.PARAM_CACHE, | |||
Output.PARAM_PERSIST, | |||
Output.PARAM_CHECKPOINT, | |||
Output.PARAM_REMOTE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a need for a stage-level/pipeline-level remote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the future, but not for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I was just wondering if it makes sense in the stage-level, or should there be a different keyword there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push_to: <remote>
or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that we'll need to support pushing/pulling granularly in the future, but for now this is a symmetrical. Interesting to note is that our imports are kinda like pull: orig_remote, push: false
, since they are pulled from external repo remote but not pushed anywhere by default. So that might be a better way to handle #4581 (e.g. push: mybackup
or just true
for default remote). But that's beyond the scope for this PR. Or do you see where remote:
might get problematic in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think of remote
to be a data-management thing, so I was just wondering if it is still understandable in terms of stage concept or pipeline concept.
No, it doesn't. This is similar to how
So we've talked about --run-cache before and it is a little special. It is only pushed/pulled with an explicit
etc. A bit worried about overengineering this one, need to think a bit more about it, but it is obvious that needs to be handled in a special way. |
@efiop could you please provide a brief summary in the description on how will it look like (or docs PR?)? Otherwise it makes it hard to review for folks who are not familiar with the code base. |
@shcheklein Sure, this is WIP still, so I didn't add it yet. Will do. |
@@ -326,7 +332,7 @@ def __init__( | |||
self.obj = None | |||
self.isexec = False if self.IS_DEPENDENCY else isexec | |||
|
|||
self.def_remote = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a leftover that haven't been used at all for a while.
@@ -81,6 +81,7 @@ def loadd_from(stage, d_list): | |||
desc = d.pop(Output.PARAM_DESC, False) | |||
isexec = d.pop(Output.PARAM_ISEXEC, False) | |||
live = d.pop(Output.PARAM_LIVE, False) | |||
remote = d.pop(Output.PARAM_REMOTE, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding new entries here and in schema is getting crazy these days, we'll need to finally revisit this and parsing/schema in the future.
Basic support for specifying remotes per output in both
dvc.yaml
and*.dvc
files. Only output level is supported (no stage or pipeline). Because this is an advanced feature, for now, this only supports setting it by hand, no special CLI flags (we could use--remote
indvc add
in the future, hence why we have--remote
and--to-remote
there).Examples:
for push/pull/fetch/etc the order or priority is as follows:
remote:
per output from *.dvc/dvc.yaml--remote
from CLIcore.remote
from configRelated to #2095
Docs: iterative/dvc.org#2761