Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc.api.DVCFileSystem Can not set credentials #9154

Closed
PythonFZ opened this issue Mar 10, 2023 · 11 comments
Closed

dvc.api.DVCFileSystem Can not set credentials #9154

PythonFZ opened this issue Mar 10, 2023 · 11 comments
Assignees
Labels
A: api Related to the dvc.api feature request Requesting a new feature

Comments

@PythonFZ
Copy link
Contributor

Bug Report

Description

I want to use the DVCFileSystem with custom credentials.
Ideally, I want to be able to go to any directory and use Python to load a File from a DVC repository.

@skshetry suggested that I could - for now - patch the credentials using:

from dvc.api import DVCFileSystem
from dvc.repo import Repo
url = "https://github.com/PythonFZ/IPS-Examples"
rev = "graph"
fs = DVCFileSystem(url=url, rev=rev)

# have a look at how the configuration looks like
repo = Repo()

# transfer the interesting keys (or all) to the fs repo
fs.repo.config = repo.config

unfortunatley when I use the following I get NoCredentialsError altough I moved the credentials to fs.repo.config

with fs.open("nodes/AddData/atoms.db") as f:
    print(f.readlines())
$ dvc doctor
DVC version: 2.47.0 (pip)
-------------------------
Platform: Python 3.10.9 on Linux-5.19.0-32-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 0.42.1
        dvc_objects = 0.21.1
        dvc_render = 0.2.0
        dvc_task = 0.2.0
        scmrepo = 0.1.15
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.3.0, boto3 = 1.24.59)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p5
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p5
Repo: dvc, git
@efiop
Copy link
Contributor

efiop commented Mar 13, 2023

@PythonFZ Seems like you have a repo locally already, so why not just use repo.dvcfs?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Mar 13, 2023
@PythonFZ
Copy link
Contributor Author

@PythonFZ Seems like you have a repo locally already, so why not just use repo.dvcfs?

I only used the local repository as a template to get the correct Repo.config. Typically, I don't have the local repository.

@efiop
Copy link
Contributor

efiop commented Mar 13, 2023

Sorry, looks like I'm missing some context here, but have you tried passing config= to the DVCFileSystem? Also, could you post the traceback for the error you are getting, please?

@skshetry
Copy link
Member

@efiop, we don't support passing config to DvcFileSystem at the moment, as external_repo() does not allow it.

@efiop
Copy link
Contributor

efiop commented Mar 13, 2023

Ah, right, that's a url. Thanks for clarifying!

@PythonFZ
Copy link
Contributor Author

@efiop is this still awaiting response ? It is difficult to provide a fully reproducible example because it would require a S3 remote.

@daavoo daavoo added feature request Requesting a new feature A: api Related to the dvc.api and removed awaiting response we are waiting for your reply, please respond! :) labels Mar 14, 2023
@tibor-mach
Copy link
Contributor

We discussed this with @dberenbaum in #4336 as the option to set credentials inside the python API would be very useful for integration with Databricks.

The use-case is this - with a lot of our customers, Databricks notebooks are used as a prototyping environment. It would be great if the users could make use of a data registry even as they are prototyping. However, in order to do that it would be necessary to store credentials (say to an S3 or azure blobstorage) in the repository itself (Databricks has a feature called "Repos" which basically clones a repository from GitHub/GitLab to a "local" Databricks environment. Normally, one would set config.local for storing such credentials, but this environment is not really local and keeping credentials in plain text there is similar to keeping them versioned by git.

So the way I would like to use this feature is by storing the credentials as databricks secrets (where they are already kept anyway when one wants databricks to communicated with cloud storage) and then passing them to DVC through the API inside of a script. Without that, there is no secure way (that I can see) to make use of a data registry from inside Databricks Repos.

@efiop
Copy link
Contributor

efiop commented May 13, 2023

@efiop, we don't support passing config to DvcFileSystem at the moment, as external_repo() does not allow it.

For the record: _external_repo helper does support it now and we do use it in our dvcfs #9306 . It should also work with api's dvcfs, but I haven't tested it.

@efiop
Copy link
Contributor

efiop commented Jun 15, 2023

Tried it myself and it indeed works now, for example:

from dvc.api import DVCFileSystem

config = {
    "remote": {
        "myssh": {
            "passphrase": "mypass",
        },
    },
}

fs = DVCFileSystem("https://github.com/myuser/myrepo", config=config)

with fs.open("myfile") as fobj:
    print(fobj.read())

@efiop efiop closed this as completed Jun 15, 2023
@efiop
Copy link
Contributor

efiop commented Jun 15, 2023

@PythonFZ @tibor-mach Please let us know how it works for you. Thank you again for the request!

@efiop efiop self-assigned this Jun 15, 2023
@efiop efiop added this to DVC Jun 15, 2023
@github-project-automation github-project-automation bot moved this to Backlog in DVC Jun 15, 2023
@efiop efiop moved this from Backlog to Done in DVC Jun 15, 2023
@efiop
Copy link
Contributor

efiop commented Jun 15, 2023

For the record: will upadate docs along with ones for #9610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: api Related to the dvc.api feature request Requesting a new feature
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants