Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc import/pull from private DVC remote fails despite configured access credentials #8544

Closed
sisp opened this issue Nov 9, 2022 · 3 comments

Comments

@sisp
Copy link
Contributor

sisp commented Nov 9, 2022

Bug Report

Description

I've tried importing data from a data registry whose DVC remote is private, i.e. it requires access credentials, using dvc import ..., but it appears the locally configured credentials for this DVC remote are not used. Some brief investigation suggests that the Git-versioned DVC config from the imported repository, which contains the config of the corresponding DVC remote, is not merged with the local config of the project that imports the data.

Reproduce

I've created a GitHub project that contains a reproducible example: https://github.com/sisp/dvc-import-auth-example

A brief overview:

  • Project A is a Git repository (for simplicity, a branch) that contains a data registry.
  • The DVC remote of Project A is a GitLab generic packages repository associated with the private GitLab project https://gitlab.com/sisp/dvc-import-auth-example. It must be private so that credentials are required for accessing the data in the GitLab generic packages repository.
  • Project B is a Git repository (for simplicity, a branch) that imports data from Project A.
  • A GitLab deploy token with scope read_package_registry is used in Project B for pulling data from the private DVC remote associated with Project A.

This is the error I'm getting:

$ dvc import --rev project-a -o project-a-data.txt --no-download [email protected]:sisp/dvc-import-auth-example.git data.txt
Importing 'data.txt ([email protected]:sisp/dvc-import-auth-example.git)' -> 'project-a-data.txt'

$ dvc pull
ERROR: configuration error - HTTP 'custom' authentication require both 'custom_auth_header' and 'password'
ERROR: HTTP 'custom' authentication require both 'custom_auth_header' and 'password'
Learn more about configuration settings at <https://man.dvc.org/remote/modify>.

Expected

DVC should be able to pull the imported data from the DVC remote associated with Project A by using the access credentials configured locally in Project B.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.34.0 (pip)
---------------------------------
Platform: Python 3.9.13 on Linux-5.13.0-48-generic-x86_64-with-glibc2.31
Subprojects:
	dvc_data = 0.25.3
	dvc_objects = 0.12.2
	dvc_render = 0.0.12
	dvc_task = 0.1.4
	dvclive = 1.0.1
	scmrepo = 0.1.3
Supports:
	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/mapper/vgubuntu-root
Repo: dvc, git
@pmrowla
Copy link
Contributor

pmrowla commented Nov 10, 2022

This is the same issue as #4604. dvc import currently does not read the local configuration in project B, it will only use any credentials specified for the default remote in project A.

Closing as duplicate, please follow the linked issue for updates

@pmrowla pmrowla closed this as not planned Won't fix, can't repro, duplicate, stale Nov 10, 2022
@sisp
Copy link
Contributor Author

sisp commented Nov 10, 2022

I see. That's a major blocker though because it makes DVC useless for managing data in private data registries. It's impossible to provide credentials in Project A because they would need to be versioned and that obviously makes no sense.

@sisp
Copy link
Contributor Author

sisp commented Nov 12, 2022

A workaround is to use the global instead of local config for configuring the DVC remote credentials of Project A, e.g.:

dvc remote modify --global <PROJECT_A_REMOTE> password <PASSWORD>

However, DVC complains when only the credentials of the remote are added to the global config without the URL although the URL is already configured in the versioned config of Project A. This is not a blocker but suboptimal user experience. I understand why DVC complains about the missing URL because it doesn't know at the beginning of the pull command that the URL is configured in the imported Project A and it would have to first clone the Git repo of Project A to retrieve the URL. Still, it's not intuitive and redundant to fully configure the remote of Project A in the global config again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants