-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory may not shrink fast enough #5840
Comments
Curious how much remains to do here given all the work already done? |
There was no work done so far for WIndows or MacOSX. The production impact is still relevant given the above. Of the impacted unit tests listed above, those in test_worker.py have been reworked to use mocks. |
This is a follow-up from #5813.
Problem
The spill and pause thresholds, the Active Memory Manager, and rebalance() all rely on the process memory to shrink after calling PyFree.
This does not reliably happen on Windows and MacOSX; process memory remains allocated and, at the next PyMalloc call, it is reused.
The situation on Linux was substantially improved in the past by setting the MALLOC_TRIM_THRESHOLD_ environment variable (see https://distributed.dask.org/en/stable/worker.html#memory-not-released-back-to-the-os)
This does not completely remove the issue, particularly for highly fragmented memory, as flakiness in the unit tests demonstrates (see #5848).
Production impact
Possible solutions
Impacted tests
test_worker.py::test_spill_spill_threshold
test_worker.py::test_spill_hysteresis
(xfails on MacOS for this reason)test_worker:py::test_pause_executor
(seems stable now with a 400MB slab of unmanaged memory; it was flaky with 250MB)test_scheduler.py::test_memory
The tests are stable at the moment of writing, but they've required a lot of effort and stress testing to make so.
Issue is mitigated in the tests by
The text was updated successfully, but these errors were encountered: