Ensure garbage collection of `distributed.scheduler.TaskState` instances #6364

hendrikmakait · 2022-05-18T10:06:42Z

Partially addresses #6250

Tests added / passed
Passes pre-commit run --all-files

fjetter · 2022-05-18T10:13:22Z

distributed/utils_test.py

+    wait_profiler()
+    gc.collect()


I used this myself already but this is actually not safe. The wait_profiler polls for the thread to not be running. If we rely on ordinary ref-count based object collection, this is sufficient since as soon as the profiler thread pauses, the ref-count based collection cleans up all references and we're good.

Most of the TaskStates are actually part of a densely connected, self referencing data structure. these self referencing cycles are not necessarily a problem since the gc.collect can detect these, break them and clean up. However, the background thread may actually already be running again after leaving wait_profiler since we're only polling and are not actually stopping the thread.

We'd rather need something like

with no_profiler(): # Some magic (e.g. a lock) ensures that the profile thread cannot watch while this ctx manager is held gc.collect()

Maybe this?

from distributed import profile with profile.lock: gc.collect()

# profile.py lock = threading.Lock() def _watch(...): ... with lock: frame = sys._current_frames()[thread_id]

(it seems simpler to reference a lock directly rather than make a context manager)

yes, I was typing the contextmanager first and while typing realized this is not occult magic but simply a lock :)

github-actions · 2022-05-18T12:50:40Z

Unit Test Results

      15 files ±    0     15 suites ±0 9h 4m 59s ⏱️ + 1h 59m 40s
  2 801 tests ±    0 1 229 ✔️ -   1 493     90 💤 +12 1 302 ❌ +1 301   180 🔥 +  180
21 079 runs +309 9 567 ✔️ - 10 280 1 002 💤 +80 9 242 ❌ +9 241 1 268 🔥 +1 268

For more details on these failures and errors, see this check.

Results for commit 97c862d. ± Comparison against base commit 36e9946.

Track scheduler instances in

97c862d

hendrikmakait changed the title ~~Ensure garbage collection of scheduler `TaskState instances~~ Ensure garbage collection of scheduler TaskState instances May 18, 2022

hendrikmakait changed the title ~~Ensure garbage collection of scheduler TaskState instances~~ Ensure garbage collection of distributed.scheduler.TaskState instances May 18, 2022

fjetter reviewed May 18, 2022

View reviewed changes

hendrikmakait mentioned this pull request May 24, 2022

Add a lock to distributed.profile for better concurrency control #6421

Merged

2 tasks

hendrikmakait closed this Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure garbage collection of `distributed.scheduler.TaskState` instances #6364

Ensure garbage collection of `distributed.scheduler.TaskState` instances #6364

hendrikmakait commented May 18, 2022

fjetter May 18, 2022

fjetter May 18, 2022

mrocklin May 18, 2022

mrocklin May 18, 2022

fjetter May 18, 2022

github-actions bot commented May 18, 2022

Ensure garbage collection of distributed.scheduler.TaskState instances #6364

Ensure garbage collection of distributed.scheduler.TaskState instances #6364

Conversation

hendrikmakait commented May 18, 2022

fjetter May 18, 2022

Choose a reason for hiding this comment

fjetter May 18, 2022

Choose a reason for hiding this comment

mrocklin May 18, 2022

Choose a reason for hiding this comment

mrocklin May 18, 2022

Choose a reason for hiding this comment

fjetter May 18, 2022

Choose a reason for hiding this comment

github-actions bot commented May 18, 2022

Unit Test Results

Ensure garbage collection of `distributed.scheduler.TaskState` instances #6364

Ensure garbage collection of `distributed.scheduler.TaskState` instances #6364