Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: document the way config files play together, add some how-tos or examples? #1368

Closed
kopytjuk opened this issue Jul 11, 2019 · 7 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) duplicate This issue or pull request already exists. type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@kopytjuk
Copy link

kopytjuk commented Jul 11, 2019

EDIT by @shcheklein

Repurposed from a few issues - both come from some misunderstanding/lack of proper docs on how config files (default vs local vs system vs global) play together.

#1368 (comment)


Currently, there are only two ways to handle secrets (e.g. connection strings, tokens, etc)

  1. Use .dvc/config.local file. Since this file is not shared among developers in Git, a team has to agree on a local configuration to pull/push data.
  2. Use a single environment variable, e.g. AZURE_STORAGE_CONNECTION_STRING - but if one has different storage destinations, one variable is not enough.

It would be very handy to use .dvc/config but with own placeholders for secrets:

['remote "my-azure-remote"']
url = azure://my-container/dvc/
connection_string = $MY_SECRET

If the right secret is set, we can share .dvc/config without revealing secrets in Git.

@steffansluis
Copy link
Contributor

(at the request of @pared after speaking on Discord)

I have a data registry backed by Azure blob storage, hosted on Github. The credentials for Azure/the definition of the remote (let's call it "data-registry") are in .dvc/config.local (in the data-registry repo), and as such are not commited to version control.

I now want to use data from it in a project, which I would expect to be able to do with something like: dvc import [email protected]:myorganization/data-registry.git data/data.json. It clones the git repo, finds data.json.dvc, checks the local cache (doesn't find it), and then breaks on remote_conf = repo.config["remote"][name.lower()] with KeyError: 'data-registry'. I have the remote defined in config.local same as in the data-registry, but it never seems to get to the point of using that information.

I am using the --local option because I don't want to put my credentials under version control (i.e. the connection_string when it comes to Azure). I would expect that since it is not defined in the original repo, but it is defined in the project repo that DVC understands it should use the remote definition that actually exists, albeit not in the original repo. Taking this principle further, if I would have defined it in the original repo and in the project repo, I would expect DVC to overwrite the config from the original repo with the one provided in the project repo.

A workaround is possible by using an environment variable to supply the connection string.

@efiop
Copy link
Contributor

efiop commented May 23, 2020

@steffansluis You could also use your --system or --global configs to define those remotes. Those will be used by get/import.

@shcheklein
Copy link
Member

Looks we have a two different use cases here. One for the regular workflow, one for get and import. May be it's better to split this into two tickets (or even close if can't come up with some actions points).

@kopytjuk have you tried to use two configs simultaneously?

The way I understood your concern is that if we define a remote (name and all settings) in a local config (for security concerns) all other team members have to agree on the url and potentially name for the remote as well.

It's not a well known or documented feature of DVC, but DVC merges sections with same name from different configs. What it means in our case:

we put

['remote "my-azure-remote"']
url = azure://my-container/dvc/

in the regular .dvc/config and it is shared across all team members.

at the same moment, we can put

['remote "my-azure-remote"']
connection_string = <conn string>

in the local config - .dvc/config.local.

This way every team member can specify a connection string per remote, while everyone agree on URL being used.

Would it solve the issue for you, @kopytjuk ?


@steffansluis

in your case, it is the same idea, but you should be using --system or --global configs. It makes sense, since dvc get, for example by definition a "global" command - you can run it outside of repo, so it does not even have an access to .dvc/config.local.

Please, let us know what do you think?

@steffansluis
Copy link
Contributor

@shcheklein I am now indeed using --global to define my the connection_string for my remotes, that works nicely for dvc import (and presumably dvc get as well). I think the config behavior could use some documentation and maybe a slight rethinking to make it a little more intuitive?

I discussed it with @efiop a bit, the main point of the current implementation is to avoid collisions in the config as that would lead to strange and hard to debug behavior. There seems be be some resemblance to how package managers like apt/yum deal with this, .e.g. "mirrors" (https://discordapp.com/channels/485586884165107732/565699007037571084/713859443204554792).

I think it makes sense to close or "rework" this issue into a general polishing of the remotes/config, although it would be even less actionable at this point, using --system or --global should cover the use cases that this issue is about. It might be nice to document the current behavior in the mean time.

@shcheklein
Copy link
Member

@steffansluis okay, repurposing and moving this to the iterative/dvc.org for now ! Thanks for the feedback.

@shcheklein shcheklein changed the title Remote secret handling in .dvc/config config: document the way config files play together, add some how-tos or examples? May 27, 2020
@shcheklein shcheklein transferred this issue from iterative/dvc May 27, 2020
@shcheklein shcheklein added A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions labels May 27, 2020
@kopytjuk
Copy link
Author

It's not a well known or documented feature of DVC, but DVC merges sections with same name from different configs. What it means in our case ...

Hey, thank you for your idea - that "merging" behaviour was not known to me - seems like a good solution.

@jorgeorpinel
Copy link
Contributor

Checking back on this, for now I made a small update to the dvc config ref. (see #3359) but the larger change will be to address #340 which includes this one. Closing as dupe

@jorgeorpinel jorgeorpinel added duplicate This issue or pull request already exists. status: stale You've been groomed! and removed status: stale You've been groomed! labels Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) duplicate This issue or pull request already exists. type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants