Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc import from s3://... fails when s3 provided by 3rd party provider. #1280

Closed
bwalsh opened this issue Oct 30, 2018 · 9 comments
Closed

dvc import from s3://... fails when s3 provided by 3rd party provider. #1280

bwalsh opened this issue Oct 30, 2018 · 9 comments
Milestone

Comments

@bwalsh
Copy link
Contributor

bwalsh commented Oct 30, 2018

  • Standard aws client works, if endpoint url provided.
    • aws --endpoint-url https://<my-endpoint> s3 ls <my-path>
  • DVC repository works, endpoint_url provided
['remote "minio"']
url = s3://<my-path>
endpointurl = <my-endpoint>

However dvc import fails because no capability to specify endpoint url
More details: https://gist.github.com/bwalsh/1afea0a2499b5e81c507dbc24521038e

@ghost ghost self-assigned this Oct 30, 2018
@ghost ghost added the bug Did we break something? label Oct 30, 2018
@ghost ghost added this to the Queue milestone Oct 30, 2018
@efiop
Copy link
Contributor

efiop commented Oct 31, 2018

Hi @bwalsh !

You can actually use that remote in the dvc import command. E.g. dvc import remote://minio/path/to/file file(NOTE: path/to/file should be specified relative to the remote URL, e.g. if URL is s3://mybucket/mydir and you want file s3://mybucket/mydir/myfile then it should be remote://remote/myfile). Could you please try it out and see if it works for you?

Thanks,
Ruslan

@ghost ghost removed their assignment Oct 31, 2018
@ghost ghost removed the bug Did we break something? label Oct 31, 2018
@shcheklein
Copy link
Member

@bwalsh if you want to use one bucket for remote cache and another bucket for remote dependencies, I believe, you will need to create a separate remote using dvc remote add - https://dvc.org/doc/commands-reference/remote#add and use it instead the minio one

@shcheklein
Copy link
Member

@efiop @MrOutis Just a thing to think. I don't have any good suggestions yet. It's a little bit confusing that we use the same notion of remotes for storing cache and to specify external dependencies. Term "remote" came from git where it has only one specific purpose - as a central place to store commits (cache).

@bwalsh
Copy link
Contributor Author

bwalsh commented Oct 31, 2018

@efiop @MrOutis @shcheklein thanks everyone for the quick response.

I tried setting up the second remote, adding the endpoint url, and importing. I can confirm the import remote://... works, thanks.

Left to my own devices I wouldn't have guessed it was a possibility.
I wish aws would fix their cli config this issue w/ awscli has been open for a while. In the interim the remote:// solution works, but is semantically confusing.

@efiop
Copy link
Contributor

efiop commented Oct 31, 2018

@bwalsh Sorry for a dumb question, but I just want to clarify. Are you aware that the dvc import not only downloads the file, but also tracks the remote source of that file and re-downloads if it has changed?

@bwalsh
Copy link
Contributor Author

bwalsh commented Oct 31, 2018

@efiop : yes I was aware. The semantics of dvc import are clear, it's the semantics the of remote:// by import

@ghost
Copy link

ghost commented Oct 31, 2018

@efiop, I'm going to label this as documentation to improve/clarify the remote:// "protocol"

@ghost ghost added the documentation label Oct 31, 2018
@ghost
Copy link

ghost commented Nov 8, 2018

Related: iterative/dvc.org#108

@efiop
Copy link
Contributor

efiop commented Nov 24, 2018

Let's close this one, since we have same issue opened on dvc.org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants