-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Putting dask-worker-space
in /tmp
is unfriendly on shared systems
#6748
Comments
It is worth noting that most Linux filesystems provide |
@crusaderky any thoughts here? Pinging you as it looks like you pushed #6658 over the finish line |
If this causes problems, we can also revert #6658 I still believe CWD is not a good choice for these things but it might be the lesser evil |
Yeah was trying to think of what a better default would be, but agree it is tricky. Sometimes the home directory is shared via NFS, which makes it a bad scratch space location. Some of this is also a documentation issue ( #4000 ). Currently we have this, but it shows how to configure a single worker. Whereas users likely benefit more from knowing about setting |
Yes, I think the OP about permissions in a shared FS could be handled by better documentation. I think the shared system is a sufficiently rare case that we could point to the documentation and ask users to configure What worries me is your comment about the memory mapping since this obviously defeats the purpose of spilling and something like this should not be the default. Is there a pythonic way to infer whether or not a directory is memory mapped? If so we could try to use /tmp first and fall back to cwd if that's the case. |
Right; if writing to /tmp it should be
We're already doing that for the directories inside dask-worker-space.
Not quite - it's a ramdisk backed by the swap file.
Or on cloud services it can be an EBS or equivalent.
I don't think we should revert #6658, due to the test suite.
|
I like this proposal but I don't know how this impacts the various deployment tools cc @jacobtomlinson |
That means it would actually swap to disk if memory pressure is high? That would be fine, I guess. |
Assuming there is a swap file configured. |
This seems like a reasonable default (even in the single-user case) which would avoid the initial problem certainly. What are the downsides compared to sticking with |
If you mounted tmpfs on /tmp but didn't mount a swap partition, your OS is poorly setup and it is not an application problem. |
@crusaderky I'm not excited about having different defaults here. Some deployment tools use
@crusaderky This seems reasonable, although it may break down on containerized systems where
@fjetter Is it? This seems very common on HPC. |
While a niche use case for dask (I suspect), this is unfortunately often the default in HPC systems where you don't have control over the OS setup. |
If it's inside the container things should be fine, no? |
I wouldn't call HPC a niche Dask case.
Often |
FWIW, the user survey from last year shows that the 2nd (1st is ssh) most popular way of launching dask is with HPC tooling:
@jrbourbeau just opened dask/community#269 to get stats for this year |
Holy loss of segregation, Batman! |
It depends on the HPC, if it is using singularity or something similar the usernames will be correct (and therefore unique). But if the container runtime does user namespacing like Docker does then all users will have the same username. So if they all have the same username, and the same |
Back to tmpfs vs. cwd: I'm very concerned about jupyter notebooks stored on NFS. Such notebooks will spill to NFS when they start a LocalCluster. I think that for the use case of the devbox tmpfs is a better default than cwd. On the flip side, I think it's reasonable to ask users that are doing production deployments to think their parameters through. |
With user namespacing the I agree with the concerns about it being CWD, but the concerns around tmpfs are also valid. Neither is a particularly good option for certain (large) groups of our users. I commonly see HPC users set this option to somewhere like An alternative would be to always warn if the user doesn't explicitly set this and try and encourage this to always be set. Or maybe just warn if there are existing dask worker space folders in the default location? |
I meant |
Could probably be caught by fixing this once in dask-jobque. I doubt that there are many HPC users with a custom solution. Just to be clear, I don't have a strong preference here. I introduced this change because I was annoyed about CWD but wasn't aware of any other implications. I'm cool with either solution |
HPC users commonly use |
Since d59500e (#6658), launching workers can fail on shared systems if someone else happens to be running at the same time (since they will have created
/tmp/dask-worker-space
and we won't have write permissions).A friendlier approach would probably be to use
tempfile.mkdtemp
to create the directory. This has the disadvantage that if a cluster fails and doesn't clean up, a new run would not clean out the old lock directory.The text was updated successfully, but these errors were encountered: