Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External dependencies are shared/global state #2448

Closed
somewacko opened this issue Aug 28, 2019 · 5 comments
Closed

External dependencies are shared/global state #2448

somewacko opened this issue Aug 28, 2019 · 5 comments
Labels
question I have a question?

Comments

@somewacko
Copy link

Another conversation from Discord: This page describes how to set up external dependencies, where stages can write to and read from remote objects.

Could this cause problems for projects with multiple users in the same repo? For example, say two teammates have different versions of the codebase checked out, and both run the same dvc repro command at the same time that produces and reads from an external dependency -- wouldn't this cause a race condition where one user might end up accessing the other's object? (or worse, the wrong object gets cached?) AFAIK there's not locking mechanism for external objects.

If so, this should probably be noted on the docs (happy to volunteer!) But this also poses a hairy problem for projects with external dependencies since these objects are always volatile. How should this sort of thing be handled?

@efiop
Copy link
Contributor

efiop commented Aug 28, 2019

@zo7 Correct, in those cases where dependencies might change, we recommend users having external workspaces defined. For example, you could do something like

dvc remote add myworkspace s3://bucket/$USER --local # notice the --local, which writes to .dvc/config.local which is not tracked by git
dvc run -d remote://myworkspace/data ...

this way your workspace is dynamic and depends on the user, so there are no race conditions between different users(unless they are using more than 1 repo instance at the same time, of course).

If so, this should probably be noted on the docs (happy to volunteer!)

We would really appreciate that :) The docs repo is https://github.com/iterative/dvc.org, let us know if you need any help.

@efiop efiop added the question I have a question? label Aug 28, 2019
@ghost
Copy link

ghost commented Aug 28, 2019

Just to clarify, it would be the same cache but different remotes ^

@somewacko
Copy link
Author

Oh cool! I didn't realize you could set a local workspace like that.

Would that make sense to suggest setting up local workspaces in the docs? Would this have to be a step that users have to do manually when setting up the repo before they can dvc repro? (i.e. git clone -> dvc pull -> dvc remote)

unless they are using more than 1 repo instance at the same time

How do you think this should work for a server or CI system that calls dvc repro on multiple branches all at one? Maybe set up a local workspace based on the git SHA?

@efiop
Copy link
Contributor

efiop commented Aug 29, 2019

@zo7

Would that make sense to suggest setting up local workspaces in the docs? Would this have to be a step that users have to do manually when setting up the repo before they can dvc repro? (i.e. git clone -> dvc pull -> dvc remote)

Yes, it would make total sense. I guess it would be suitable to add somewhere around https://dvc.org/doc/user-guide/external-outputs . It seems like it would suit as a part of iterative/dvc.org#108 Maybe you would even consider contributing a doc based on your experience? 😉

How do you think this should work for a server or CI system that calls dvc repro on multiple branches all at one? Maybe set up a local workspace based on the git SHA?

Yes, that would totally work 🙂

@efiop
Copy link
Contributor

efiop commented Nov 18, 2019

Closing. Docs ticket is iterative/dvc.org#108

@efiop efiop closed this as completed Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question I have a question?
Projects
None yet
Development

No branches or pull requests

2 participants