LocalCluster does not respect memory_limit keyword when it is large #7155

mrocklin · 2022-10-18T21:44:12Z

from dask.distributed import Client
client = Client(memory_limit="300 GB")
client.run(lambda dask_worker: dask_worker.memory_limit)

{'tcp://127.0.0.1:62196': 17179869184,
 'tcp://127.0.0.1:62199': 17179869184,
 'tcp://127.0.0.1:62200': 17179869184,
 'tcp://127.0.0.1:62204': 17179869184}

It seems to respect the keyword when it's lower than available memory, but not when it's greater than. Granted I don't have 1.2 TB of memory on my laptop, but maybe it makes sense to allow the user to over-subscribe.

The text was updated successfully, but these errors were encountered:

mrocklin · 2022-10-18T21:56:39Z

from dask.distributed import Client
client = Client(memory_limit="3 GB")
client.run(lambda dask_worker: dask_worker.memory_limit)

{'tcp://127.0.0.1:62281': 3000000000,
 'tcp://127.0.0.1:62282': 3000000000,
 'tcp://127.0.0.1:62283': 3000000000,
 'tcp://127.0.0.1:62284': 3000000000}

jrbourbeau · 2022-10-18T21:59:12Z

Thanks @mrocklin -- I'm able to reproduce. This behavior is coming from this line

distributed/distributed/worker_memory.py

Line 404 in 3a23650

return min(memory_limit, system.MEMORY_LIMIT)

where we cap things at system.MEMORY_LIMIT. cc @crusaderky

mrocklin · 2022-10-18T22:00:32Z

It may also be that this is correct behavior. It was surprising (reasonably so I think). It's subjective on if we want to let users do dumb things. I think that we do want to let them opt-in to being dumb, but I don't have a strong opinion here.

jrbourbeau · 2022-10-18T22:02:12Z

I'm not sure whether or not we should let users over-subscribe or not. This may lead to bad behavior with, for example, the active memory manager. I suspect @crusaderky will have insight here. Regardless, if we keep the current behavior, it'd be good to emit a warning (or something similar) to the user letting them know they've requested more memory than is available and we're capping at the system memory. That way, there will at least be visibility into what's happening

fjetter · 2022-10-19T08:36:54Z

I consider this expected behavior. Is there any sane use case for allowing larger values?

From a UX POV we should raise a warning if this happens such that the user knows what's going on.

This also relates roughly to #6895 which discusses making the system.MEMORY_LIMIT even stricter

crusaderky · 2022-10-19T14:37:21Z

To clarify: if you have 4 workers, the current cap will let you set for each worker the whole memory of your host. This is potentially desirable, as the workload may for whatever reason be very unbalanced. Beyond that, I cannot think of any sensible use case.

AMM ReduceReplicas does not make any considerations on the memory_limit.
rebalance() (and, in the future, AMM Rebalance) however does use the memory limit to set the various bands of operation; setting a limit that you cannot even theoretically reach will just de facto disable rebalancing.

mrocklin · 2022-10-19T14:44:55Z

From a UX POV we should raise a warning if this happens such that the user knows what's going on.

Sounds like a fine outcome to me.

fjetter mentioned this issue Oct 19, 2022

Log information about memory limit #7160

Merged

jrbourbeau closed this as completed in #7160 Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LocalCluster does not respect memory_limit keyword when it is large #7155

LocalCluster does not respect memory_limit keyword when it is large #7155

mrocklin commented Oct 18, 2022

mrocklin commented Oct 18, 2022

jrbourbeau commented Oct 18, 2022

mrocklin commented Oct 18, 2022

jrbourbeau commented Oct 18, 2022

fjetter commented Oct 19, 2022

crusaderky commented Oct 19, 2022

mrocklin commented Oct 19, 2022

LocalCluster does not respect memory_limit keyword when it is large #7155

LocalCluster does not respect memory_limit keyword when it is large #7155

Comments

mrocklin commented Oct 18, 2022

mrocklin commented Oct 18, 2022

jrbourbeau commented Oct 18, 2022

mrocklin commented Oct 18, 2022

jrbourbeau commented Oct 18, 2022

fjetter commented Oct 19, 2022

crusaderky commented Oct 19, 2022

mrocklin commented Oct 19, 2022