-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure garbage collection of distributed.scheduler.TaskState
instances
#6364
Ensure garbage collection of distributed.scheduler.TaskState
instances
#6364
Conversation
TaskState
instances
TaskState
instancesdistributed.scheduler.TaskState
instances
wait_profiler() | ||
gc.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this myself already but this is actually not safe. The wait_profiler
polls for the thread to not be running. If we rely on ordinary ref-count based object collection, this is sufficient since as soon as the profiler thread pauses, the ref-count based collection cleans up all references and we're good.
Most of the TaskState
s are actually part of a densely connected, self referencing data structure. these self referencing cycles are not necessarily a problem since the gc.collect can detect these, break them and clean up. However, the background thread may actually already be running again after leaving wait_profiler
since we're only polling and are not actually stopping the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd rather need something like
with no_profiler():
# Some magic (e.g. a lock) ensures that the profile thread cannot watch while this ctx manager is held
gc.collect()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this?
from distributed import profile
with profile.lock:
gc.collect()
# profile.py
lock = threading.Lock()
def _watch(...):
...
with lock:
frame = sys._current_frames()[thread_id]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it seems simpler to reference a lock directly rather than make a context manager)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was typing the contextmanager first and while typing realized this is not occult magic but simply a lock :)
Unit Test Results 15 files ± 0 15 suites ±0 9h 4m 59s ⏱️ + 1h 59m 40s For more details on these failures and errors, see this check. Results for commit 97c862d. ± Comparison against base commit 36e9946. |
Partially addresses #6250
pre-commit run --all-files