Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: Using a shared cache in networked filesystems (NAS/NFS, etc) #4303

Open
skshetry opened this issue Feb 8, 2023 · 9 comments
Open
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: guide Content of /doc/user-guide p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately.

Comments

@skshetry
Copy link
Member

skshetry commented Feb 8, 2023

Report

It seems that there have been more questions regarding NAS/NFS being slow. And the common fix that we suggest is to set index.dir and state.dir to a directory in a non-networked filesystem.

dvc config state.dir /path/to/state
dvc config index.dir /path/to/index

It'd be nice to have a page to mention this, say why we need to do this, etc.

@dberenbaum dberenbaum added the p1-important Active priorities to deal within next sprints label Feb 10, 2023
@dberenbaum
Copy link
Collaborator

@skshetry I think it can be part of #103?

@jorgeorpinel Can we include this after we do the major cloud providers?

@shcheklein
Copy link
Member

More discussions, links here https://discord.com/channels/485586884165107732/1077945387136073869

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 8, 2023

Good idea (and it's a task in #2866). I can certainly start it, not sure how it will look as it's not technically a remote storage type but we'll see.

@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) C: guide Content of /doc/user-guide labels Mar 11, 2023
@dberenbaum
Copy link
Collaborator

Good point, @jorgeorpinel. On second thought, I don't think it makes sense to mix it with #103 since it's not about remote storage, and NAS/NFS are simpler as remote storage (don't have to worry about checkouts, temp directories, sqlite databases, etc.).

I think we need a page in the data management guide about caching, which could include:

@jorgeorpinel
Copy link
Contributor

A cache guide sounds more appropriate 👍🏼 NFS is already mentioned in the remote guide.

@dberenbaum
Copy link
Collaborator

Now that it looks like we no longer need to recommend any specific setup for NFS, I'm not so sure this is needed. Closing for now, although maybe coming back to a cache guide in the future would make sense.

@shcheklein
Copy link
Member

One item that might be important still for this scenario is using symlinks, not sure it's worth creating a separate page for this though.

@dberenbaum
Copy link
Collaborator

dberenbaum commented Apr 4, 2023

@shcheklein One thought was to move https://dvc.org/doc/user-guide/how-to/share-a-dvc-cache to the bottom of https://dvc.org/doc/user-guide/data-management/large-dataset-optimization since I think they are both related to how to handle large amounts of data in the cache and cover most of these topics, but I wasn't sure the effort is worth it (edit: actually concern is not really about effort but whether it's impactful) to move more stuff around. WDYT?

@shcheklein
Copy link
Member

Yep, not a huge priority. Also we can just always add a link from one thing to another- it's faster. I'm just worried that even in basic scenarios ppl might not be realizing that DVC is copying files. We used to have a warning in such cases in DVC but I don't think it exist anymore.

@dberenbaum dberenbaum added p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately. and removed p1-important Active priorities to deal within next sprints labels May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: guide Content of /doc/user-guide p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately.
Projects
None yet
Development

No branches or pull requests

4 participants