Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular updates (Apr 6) #1110

Merged
merged 8 commits into from
Apr 7, 2020
Merged

Regular updates (Apr 6) #1110

merged 8 commits into from
Apr 7, 2020

Conversation

jorgeorpinel
Copy link
Contributor

@jorgeorpinel jorgeorpinel commented Apr 6, 2020

@shcheklein shcheklein temporarily deployed to dvc-landing-2020-04-06-ouvosng April 6, 2020 15:15 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-2020-04-06-ouvosng April 7, 2020 01:08 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-2020-04-06-ouvosng April 7, 2020 01:31 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-2020-04-06-ouvosng April 7, 2020 08:29 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-2020-04-06-ouvosng April 7, 2020 18:09 Inactive
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so per https://discordapp.com/channels/485586884165107732/563406153334128681/696834803831406614 I reviewed a use case and a user guide to try clarifying external vs shared cache, but found some more questions, asked here. Please check my changes and questions 🙂

Comment on lines +42 to +49
Now, ensure that the cached directories and files have appropriate permissions,
so that they can be accessed by your colleagues (assuming their users are
members of the same group):

```dvc
$ sudo find /path/to/dvc-cache -type f -exec chmod 0664 {} \;
$ sudo find /path/to/dvc-cache -type d -exec chmod 0775 {} \;
$ sudo chown -R myuser:ourgroup /path/to/dvc-cache/
$ sudo find /home/shared/dvc-cache -type d -exec chmod 0775 {} \;
$ sudo find /home/shared/dvc-cache -type f -exec chmod 0664 {} \;
$ sudo chown -R myuser:ourgroup /home/shared/dvc-cache/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these commands will always work. What if the shared cache dir has some files your user can't access (if someone didn't do this process before, for example)? Wouldn't the command fail?

Comment on lines +52 to 59
## Configure the external shared cache

Tell DVC to use the directory we've set up above as an shared cache location by
running:
Tell DVC to use the directory we've set up above as the <abbr>cache</abbr> for
your <abbr>project</abbr>:

```dvc
$ dvc config cache.dir /path/to/dvc-cache
$ dvc config cache.dir /home/shared/dvc-cache
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have all these exact and reproducible command guides in a use case? This section seems more like a user guide to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree ... most of it should be part of the shared cache article in the UG. And use case should be probably more generic "Optimizing data management" which can cover this one (multiple people working on a single box), k8s scenario - multiple machines + NAS, etc. We are long overdue on this. There were contributors that made an attempt, but didn't get it right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see we have #429 for something like this already. And #986 is also related. Should we try to consolidate them into a single issue and include the comments from this review?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, these two mostly about improving the existing case - but they are relevant, up to you to consolidate or not ... there should be tickets related to this from other angles - shared cache on NAS/NFS for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other related links:

https://discuss.dvc.org/t/setup-dvc-to-work-with-shared-data-on-nas-server/180/4
#732
#455
#563

and others ... someone should a first step systematize and suggest the way to structure this properly

Comment on lines -38 to +41
> Avoid using the same remote location that you are using for `dvc push`,
> `dvc pull`, `dvc fetch` as external cache for your external outputs, because
> it may cause possible file hash overlaps: The hash value of a data file in
> external storage could collide with that generated locally for another file.
> Avoid using the same location of the
> [remote storage](/doc/command-reference/remote) that you have for `dvc push`
> and `dvc pull` for external outputs or as external cache, because it may cause
> file hash overlaps: The hash value of a data file in external storage could
> collide with the one generated locally for another file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved this note but it's still pretty strange. Difficult to understand what it is about. The ambiguity of the term "remote" is giving us problems here. Not sure how to address, maybe just remove the note?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it's a very complicated to simplify it :) the whole topic is very advanced and very technical. Let's keep it for now with your revisions and get back to it when we have more time to spend on all this data management problems.

@jorgeorpinel jorgeorpinel marked this pull request as ready for review April 7, 2020 18:15
@jorgeorpinel jorgeorpinel requested a review from shcheklein April 7, 2020 18:15
@shcheklein
Copy link
Member

@jorgeorpinel just minor comments, overall looks good to merge.

@jorgeorpinel jorgeorpinel requested a deployment to dvc-landing-2020-04-06-ouvosng April 7, 2020 22:55 Abandoned
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-2020-04-06-ouvosng April 7, 2020 22:57 Inactive
@jorgeorpinel jorgeorpinel merged commit 058679b into master Apr 7, 2020
@jorgeorpinel jorgeorpinel deleted the 2020-04-06 branch May 5, 2020 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants