Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalCluster does not respect memory_limit keyword when it is large #7155

Closed
mrocklin opened this issue Oct 18, 2022 · 7 comments · Fixed by #7160
Closed

LocalCluster does not respect memory_limit keyword when it is large #7155

mrocklin opened this issue Oct 18, 2022 · 7 comments · Fixed by #7160

Comments

@mrocklin
Copy link
Member

from dask.distributed import Client
client = Client(memory_limit="300 GB")
client.run(lambda dask_worker: dask_worker.memory_limit)
{'tcp://127.0.0.1:62196': 17179869184,
 'tcp://127.0.0.1:62199': 17179869184,
 'tcp://127.0.0.1:62200': 17179869184,
 'tcp://127.0.0.1:62204': 17179869184}

It seems to respect the keyword when it's lower than available memory, but not when it's greater than. Granted I don't have 1.2 TB of memory on my laptop, but maybe it makes sense to allow the user to over-subscribe.

@mrocklin
Copy link
Member Author

from dask.distributed import Client
client = Client(memory_limit="3 GB")
client.run(lambda dask_worker: dask_worker.memory_limit)
{'tcp://127.0.0.1:62281': 3000000000,
 'tcp://127.0.0.1:62282': 3000000000,
 'tcp://127.0.0.1:62283': 3000000000,
 'tcp://127.0.0.1:62284': 3000000000}

@jrbourbeau
Copy link
Member

Thanks @mrocklin -- I'm able to reproduce. This behavior is coming from this line

return min(memory_limit, system.MEMORY_LIMIT)

where we cap things at system.MEMORY_LIMIT. cc @crusaderky

@mrocklin
Copy link
Member Author

It may also be that this is correct behavior. It was surprising (reasonably so I think). It's subjective on if we want to let users do dumb things. I think that we do want to let them opt-in to being dumb, but I don't have a strong opinion here.

@jrbourbeau
Copy link
Member

I'm not sure whether or not we should let users over-subscribe or not. This may lead to bad behavior with, for example, the active memory manager. I suspect @crusaderky will have insight here. Regardless, if we keep the current behavior, it'd be good to emit a warning (or something similar) to the user letting them know they've requested more memory than is available and we're capping at the system memory. That way, there will at least be visibility into what's happening

@fjetter
Copy link
Member

fjetter commented Oct 19, 2022

I consider this expected behavior. Is there any sane use case for allowing larger values?

From a UX POV we should raise a warning if this happens such that the user knows what's going on.


This also relates roughly to #6895 which discusses making the system.MEMORY_LIMIT even stricter

@crusaderky
Copy link
Collaborator

To clarify: if you have 4 workers, the current cap will let you set for each worker the whole memory of your host. This is potentially desirable, as the workload may for whatever reason be very unbalanced. Beyond that, I cannot think of any sensible use case.

AMM ReduceReplicas does not make any considerations on the memory_limit.
rebalance() (and, in the future, AMM Rebalance) however does use the memory limit to set the various bands of operation; setting a limit that you cannot even theoretically reach will just de facto disable rebalancing.

@mrocklin
Copy link
Member Author

From a UX POV we should raise a warning if this happens such that the user knows what's going on.

Sounds like a fine outcome to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants