-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: add config support for open/read #9611
Conversation
@daavoo I remember you were excited about this before. Could you review/give it a try, please? |
@@ -73,6 +73,7 @@ def open( # noqa, pylint: disable=redefined-builtin | |||
remote: Optional[str] = None, | |||
mode: str = "r", | |||
encoding: Optional[str] = None, | |||
config: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be easier and simpler if it were a remote_config
, otherwise users have to be aware of our whole config structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we could just name it config
but we'll be passing:
{
"core": {"remote": remote},
"remote": {remote: config},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I would simplify even more to the structure of a single remote's config, like open(..., remote_config={"url": ..}
). If there is also a remote
arg, we can merge it with the config for that remote. if not, we can merge it with the default remote.
Related: iterative/dvc.org#4628 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config
is more powerful though and you could set new default remote there in one place as well. Sure, we could also do remote_config
but that's more niche and we will need to tell what remote we want that to apply to. Remember that one could have multiple remotes in the repository (plus also for remote notation), which remote_config
won't handle. I chose config
because it is the most complete solution, while we could add edge-case params later in the future if there will be demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config is more powerful though
It should be simple to use too.
we will need to tell what remote we want that to apply to
That already exists, there's remote
kwarg for that.
Remember that one could have multiple remotes in the repository
This is not applicable for open/read
API though as they are about single file, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally only
remote_config
would be enough
Agreed. I think it can be enough in most cases where you want to use the default remote. Take this example:
with dvc.api.open("data", remote_config={"token": ...}) as f:
This is enough to set additional config options for the default remote, which I think is the most likely use case. How hard is it to add?
Compare that to how it looks with config
:
with dvc.api.open("data", config={"remote": {"myremote": {"token": ...}}}) as f:
config
is not only longer, but users have to know the config structure and the name of the default remote.
I would rather indeed keep
config
for now as the most powerful option that is great to have around even in the future.
Is there any way it will be used for anything besides remote config? I can't see any other config section that would make sense to override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard is it to add?
For default - not hard, but currently if something is using non-default one it will work, and supporting that is the time-consuming part. I'm totally on the same page with you, remote_config
is useful, just not sure it is worth investing into right now since config
is already there.
Is there any way it will be used for anything besides remote config? I can't see any other config section that would make sense to override.
Remote config is the prime use case, but config
is the most powerful mechanism that can allow one to get out of very sticky situations (e.g. if one drops git from the repo but still wants to use it he can set no_scm
through this, this seems useful during deployment somewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the level of effort to also add Sorry, just repeating question now 🤦remote_config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss when we talk tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config is the most powerful mechanism that can allow one to get out of very sticky situations
@efiop, I think you are looking at it from a maintainer's point of view rather than the user. We can provide powerful mechanism in DvcFileSystem
, read/open
should be simpler. remote_config=
solves most of the problem imo.
Creating a new remote, and/or providing a way to configure dvc
repo is more of a niche usecase. Also, I don't feel comfortable exposing the whole config schema to the users. It has a large surface area, and can have unintended effect when not use correctly.
Allows to programmatically pass credentials and other config options similar to normal
.dvc/config
way.Fixes #9610