Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting dask-worker-space in /tmp is unfriendly on shared systems #6748

Closed
wence- opened this issue Jul 20, 2022 · 23 comments · Fixed by #7054
Closed

Putting dask-worker-space in /tmp is unfriendly on shared systems #6748

wence- opened this issue Jul 20, 2022 · 23 comments · Fixed by #7054

Comments

@wence-
Copy link
Contributor

wence- commented Jul 20, 2022

Since d59500e (#6658), launching workers can fail on shared systems if someone else happens to be running at the same time (since they will have created /tmp/dask-worker-space and we won't have write permissions).

A friendlier approach would probably be to use tempfile.mkdtemp to create the directory. This has the disadvantage that if a cluster fails and doesn't clean up, a new run would not clean out the old lock directory.

@jakirkham
Copy link
Member

It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk. As a result data that is spilled from memory to tmpfs doesn't actually leave memory. This could cause a lot of churn by Workers trying to spill and free memory without having a meaningful impact on memory pressure.

@jrbourbeau
Copy link
Member

@crusaderky any thoughts here? Pinging you as it looks like you pushed #6658 over the finish line

@fjetter
Copy link
Member

fjetter commented Aug 18, 2022

If this causes problems, we can also revert #6658

I still believe CWD is not a good choice for these things but it might be the lesser evil

@jakirkham
Copy link
Member

Yeah was trying to think of what a better default would be, but agree it is tricky. Sometimes the home directory is shared via NFS, which makes it a bad scratch space location.

Some of this is also a documentation issue ( #4000 ). Currently we have this, but it shows how to configure a single worker. Whereas users likely benefit more from knowing about setting "temporary-directory" in dask.config or using the DASK_TEMPORARY_DIRECTORY environment variable to set this as part of shell or container startup. Maybe these options could be listed here?

@fjetter
Copy link
Member

fjetter commented Aug 18, 2022

Yes, I think the OP about permissions in a shared FS could be handled by better documentation. I think the shared system is a sufficiently rare case that we could point to the documentation and ask users to configure temporary-directory in their dask/distributed.yaml.

What worries me is your comment about the memory mapping since this obviously defeats the purpose of spilling and something like this should not be the default.

Is there a pythonic way to infer whether or not a directory is memory mapped? If so we could try to use /tmp first and fall back to cwd if that's the case.

@crusaderky
Copy link
Collaborator

launching workers can fail on shared systems if someone else happens to be running at the same time (since they will have created /tmp/dask-worker-space and we won't have write permissions).

Right; if writing to /tmp it should be /tmp/dask-worker-space-$USER.

A friendlier approach would probably be to use tempfile.mkdtemp to create the directory.

We're already doing that for the directories inside dask-worker-space.

It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk.

Not quite - it's a ramdisk backed by the swap file.

Sometimes the home directory is shared via NFS

Or on cloud services it can be an EBS or equivalent.
NFS also causes a lot of problems with locks - the test suite used to fail a lot before #6658 if your workspace was on NFS.

I still believe CWD is not a good choice for these things but it might be the lesser evil

I don't think we should revert #6658, due to the test suite.
If we do want to go back to CWD, we should

  • leave Nanny and Worker as they are (default to /tmp)
  • change dask-worker to default to CWD, e.g. always pass the local_directory parameter
  • change all tests that start dask-worker to temporarily move CWD to /tmp through a fixture

@fjetter
Copy link
Member

fjetter commented Aug 18, 2022

I don't think we should revert #6658, due to the test suite.
If we do want to go back to CWD, we should

leave Nanny and Worker as they are (default to /tmp)
change dask-worker to default to CWD, e.g. always pass the local_directory parameter
change all tests that start dask-worker to temporarily move CWD to /tmp through a fixture

I like this proposal but I don't know how this impacts the various deployment tools cc @jacobtomlinson

@fjetter
Copy link
Member

fjetter commented Aug 18, 2022

It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk.

Not quite - it's a ramdisk backed by the swap file.

That means it would actually swap to disk if memory pressure is high? That would be fine, I guess.

@wence-
Copy link
Contributor Author

wence- commented Aug 18, 2022

It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk.

Not quite - it's a ramdisk backed by the swap file.

That means it would actually swap to disk if memory pressure is high? That would be fine, I guess.

Assuming there is a swap file configured.

@wence-
Copy link
Contributor Author

wence- commented Aug 18, 2022

Right; if writing to /tmp it should be /tmp/dask-worker-space-$USER.

This seems like a reasonable default (even in the single-user case) which would avoid the initial problem certainly. What are the downsides compared to sticking with dask-worker-space?

@crusaderky
Copy link
Collaborator

Not quite - it's a ramdisk backed by the swap file.

Assuming there is a swap file configured.

If you mounted tmpfs on /tmp but didn't mount a swap partition, your OS is poorly setup and it is not an application problem.

@jacobtomlinson
Copy link
Member

leave Nanny and Worker as they are (default to /tmp)
change dask-worker to default to CWD, e.g. always pass the local_directory parameter

@crusaderky I'm not excited about having different defaults here. Some deployment tools use dask-worker and some use dask-spec which invokes the Nanny, so I wouldn't feel great about the inconsistency.

/tmp/dask-worker-space-$USER

@crusaderky This seems reasonable, although it may break down on containerized systems where $USER resolves to the same thing like root or jovyan for everyone.

I think the shared system is a sufficiently rare case

@fjetter Is it? This seems very common on HPC.

@wence-
Copy link
Contributor Author

wence- commented Aug 18, 2022

If you mounted tmpfs on /tmp but didn't mount a swap partition, your OS is poorly setup and it is not an application problem.

While a niche use case for dask (I suspect), this is unfortunately often the default in HPC systems where you don't have control over the OS setup.

@wence-
Copy link
Contributor Author

wence- commented Aug 18, 2022

@crusaderky This seems reasonable, although it may break down on containerized systems where $USER resolves to the same thing like root or jovyan for everyone.

If it's inside the container things should be fine, no?

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 18, 2022

While a niche use case for dask (I suspect), this is unfortunately often the default in HPC systems where you don't have control over the OS setup.

I wouldn't call HPC a niche Dask case.

If it's inside the container things should be fine, no?

Often /tmp is mounted in.

@quasiben
Copy link
Member

FWIW, the user survey from last year shows that the 2nd (1st is ssh) most popular way of launching dask is with HPC tooling:

@jrbourbeau just opened dask/community#269 to get stats for this year

@crusaderky
Copy link
Collaborator

crusaderky commented Aug 18, 2022

If it's inside the container things should be fine, no?

Often /tmp is mounted in.

Holy loss of segregation, Batman!
Jokes aside, won't all the containers share the same user on the host VM?

@jacobtomlinson
Copy link
Member

It depends on the HPC, if it is using singularity or something similar the usernames will be correct (and therefore unique). But if the container runtime does user namespacing like Docker does then all users will have the same username.

So if they all have the same username, and the same /tmp then /tmp/dask-worker-space-$USER won't help.

@crusaderky
Copy link
Collaborator

/tmp/dask-worker-space-$UUID then.

Back to tmpfs vs. cwd: I'm very concerned about jupyter notebooks stored on NFS. Such notebooks will spill to NFS when they start a LocalCluster. I think that for the use case of the devbox tmpfs is a better default than cwd. On the flip side, I think it's reasonable to ask users that are doing production deployments to think their parameters through.

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 18, 2022

With user namespacing the UUID will also be the same for all users so I'm afraid that doesn't help.

I agree with the concerns about it being CWD, but the concerns around tmpfs are also valid. Neither is a particularly good option for certain (large) groups of our users. I commonly see HPC users set this option to somewhere like /scratch, I rarely see cloud/kubernetes users set this, so maybe /tmp and better documentation is the lesser of two evils here.

An alternative would be to always warn if the user doesn't explicitly set this and try and encourage this to always be set. Or maybe just warn if there are existing dask worker space folders in the default location?

@crusaderky
Copy link
Collaborator

With user namespacing the UUID will also be the same for all users so I'm afraid that doesn't help.

I meant uuid.uuid4(), not $UID. Sorry for the confusion.

@fjetter
Copy link
Member

fjetter commented Aug 18, 2022

@fjetter Is it? This seems very common on HPC.

Could probably be caught by fixing this once in dask-jobque. I doubt that there are many HPC users with a custom solution.
I was mostly thinking about users that just spin up a cluster using the default settings, i.e. without any configuration or any wrapper, e.g. LocalCluster on a big VM or a laptop, etc.

Just to be clear, I don't have a strong preference here. I introduced this change because I was annoyed about CWD but wasn't aware of any other implications. I'm cool with either solution

@jacobtomlinson
Copy link
Member

HPC users commonly use dask-jobqueue, dask-ssh, dask-mpi and dask-gateway. So if we wanted to handle it downstream those would be the places to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants