Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: add config support for open/read #9611

Merged
merged 1 commit into from
Jun 15, 2023
Merged

api: add config support for open/read #9611

merged 1 commit into from
Jun 15, 2023

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Jun 15, 2023

Allows to programmatically pass credentials and other config options similar to normal .dvc/config way.

Fixes #9610

@efiop
Copy link
Contributor Author

efiop commented Jun 15, 2023

@daavoo I remember you were excited about this before. Could you review/give it a try, please?

@efiop efiop requested a review from daavoo June 15, 2023 15:21
@efiop efiop added the A: api Related to the dvc.api label Jun 15, 2023
@efiop efiop merged commit 0433565 into iterative:main Jun 15, 2023
@@ -73,6 +73,7 @@ def open( # noqa, pylint: disable=redefined-builtin
remote: Optional[str] = None,
mode: str = "r",
encoding: Optional[str] = None,
config: Optional[Dict[str, Any]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be easier and simpler if it were a remote_config, otherwise users have to be aware of our whole config structure.

Copy link
Member

@skshetry skshetry Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we could just name it config but we'll be passing:

{
  "core": {"remote": remote},
  "remote": {remote: config},
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I would simplify even more to the structure of a single remote's config, like open(..., remote_config={"url": ..}). If there is also a remote arg, we can merge it with the config for that remote. if not, we can merge it with the default remote.

Related: iterative/dvc.org#4628 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is more powerful though and you could set new default remote there in one place as well. Sure, we could also do remote_config but that's more niche and we will need to tell what remote we want that to apply to. Remember that one could have multiple remotes in the repository (plus also for remote notation), which remote_config won't handle. I chose config because it is the most complete solution, while we could add edge-case params later in the future if there will be demand.

Copy link
Member

@skshetry skshetry Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is more powerful though

It should be simple to use too.

we will need to tell what remote we want that to apply to

That already exists, there's remote kwarg for that.

Remember that one could have multiple remotes in the repository

This is not applicable for open/read API though as they are about single file, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally only remote_config would be enough

Agreed. I think it can be enough in most cases where you want to use the default remote. Take this example:

with dvc.api.open("data", remote_config={"token": ...}) as f:

This is enough to set additional config options for the default remote, which I think is the most likely use case. How hard is it to add?

Compare that to how it looks with config:

with dvc.api.open("data", config={"remote": {"myremote": {"token": ...}}}) as f:

config is not only longer, but users have to know the config structure and the name of the default remote.

I would rather indeed keep config for now as the most powerful option that is great to have around even in the future.

Is there any way it will be used for anything besides remote config? I can't see any other config section that would make sense to override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard is it to add?

For default - not hard, but currently if something is using non-default one it will work, and supporting that is the time-consuming part. I'm totally on the same page with you, remote_config is useful, just not sure it is worth investing into right now since config is already there.

Is there any way it will be used for anything besides remote config? I can't see any other config section that would make sense to override.

Remote config is the prime use case, but config is the most powerful mechanism that can allow one to get out of very sticky situations (e.g. if one drops git from the repo but still wants to use it he can set no_scm through this, this seems useful during deployment somewhere).

Copy link
Collaborator

@dberenbaum dberenbaum Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the level of effort to also add remote_config? Sorry, just repeating question now 🤦

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss when we talk tomorrow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is the most powerful mechanism that can allow one to get out of very sticky situations

@efiop, I think you are looking at it from a maintainer's point of view rather than the user. We can provide powerful mechanism in DvcFileSystem, read/open should be simpler. remote_config= solves most of the problem imo.

Creating a new remote, and/or providing a way to configure dvc repo is more of a niche usecase. Also, I don't feel comfortable exposing the whole config schema to the users. It has a large surface area, and can have unintended effect when not use correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: api Related to the dvc.api
Projects
None yet
Development

Successfully merging this pull request may close these issues.

api: support config/credentials passing
3 participants