-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dependency injection for proc memory mocks #6004
Conversation
b9d780b
to
4f7c70f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the whole thing feels over-designed here.
The failures (for reference: https://github.com/dask/distributed/runs/5693287345?check_suite_focus=true) are both caused by
- having
"memory_limit": "1 GB"
- the worker is already beyond 1GB RAM when the worker starts, as it runs in the main process
- the test is about pause
- hence, the pause may kick in before the monkey patch of the instance can happen.
The solution is very simple - just make memory_limit and all mocked measures safely larger than what the whole test suite will ever consume - e.g. 10 GB.
or self.memory_limit is None | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or self.memory_limit is None | |
): | |
): | |
assert self.memory_limit is not None | |
assert self.memory_terminate_fraction is not False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few lines below we're calculating memory / self.memory_limit
, i.e. if memory_limit is None we can skip memory_monitor
/ return early.
So, why remove the self.memory_limit is None
from the if clause and why add these asserts? If these asserts are a concerning edge case, we should probably rather add a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not an edge case; the whole method will never be scheduled if memory_limit is disabled. See lines ~140 in __init__
. There are already several tests for it.
distributed/worker_memory.py
Outdated
@@ -302,32 +313,46 @@ def __init__( | |||
dask.config.get("distributed.worker.memory.monitor-interval"), | |||
default=None, | |||
) | |||
self.nanny = weakref.ref(nanny) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is self._worker above - could you make it consistent?
Also, please add to the attributes declarations above in both classes.
I understand this concern. I still thing we should go through with this. I don't think mocking something mid flight is a stable way of testing. The fact that the pause may already kick in before the test even starts, regardless of how the settings are configured is not stable and harder to maintain. Even if setting the parameters properly avoids this issue, there is a lot of knowledge required to A) identify the situation and B) to set the proper values. Instead of a monkeypatch, a fake object, i.e. a fully functional object with a few simplifications (e.g. the process memory) is typically a more robust design and this is what I am proposing here. Providing the type of |
Unit Test Results 11 files - 1 11 suites - 1 6h 3m 0s ⏱️ - 33m 14s For more details on these failures and errors, see this check. Results for commit 0ec2a91. ± Comparison against base commit 6dd928b. |
The log of the failed tests (https://github.com/dask/distributed/runs/5693287345?check_suite_focus=true) states:
it looks very explicit to me.
I think it's a safe statement that the main process of our test suite will never reach 10GB permanent RAM. You can use 100GB or 1TB if you want to remove all ambiguity.
The pause may kick in on any test decorated by
I would agree for the actual production code. I disagree for unit tests - you're just making them harder to maintain.
You're adding complexity to Worker for the only purpose of unit testing. This to me is an antipattern. |
Closing in favour of #6055 |
The mocks introduced in #5878 are unstable in tests like
test_fail_to_pickle_spill
because the mocks only register once the test is entered. With the extremely small memory limit, and the extreme small monitor-interval, the event loop appears to be stressed out sufficiently that the workers actually never come up in time and the test times out. Increasing the interval is one option. I encountered this during #5910
The mocks are not to blame here but rather that we're patching an initialized instance after it has been started. IMO, the proper way would be to use dependency injection using a fake object instead of a monkeypatch.
This introduces a dependency injection pattern for the WorkerMemoryManager and defines a specific mock target as motivated already in #5870 (comment)