Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cache in iscoroutinefunction to avoid holding on to refs #5985

Merged
merged 1 commit into from
Mar 24, 2022

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Mar 23, 2022

This is currently suspected to be one of the reasons why our CI is behaving very awkwardly, e.g. why sometimes GC takes so long to run. We couldn't prove this but we know for a fact that this causes problems by accumulating DequeHandlers, see #5973

I believe we should find a method which does not hold on to references if we indeed want to cache this. However, introducing any mechanic in here that would ensure us not holding on to references is likely much slower than simply using the inspect call.

The cache was introduced in #4481 and basically reverts #4469

We don't have any benchmarks about this I could run to verify the impact. Generally, I'm very surprised to see this taking so long. Based on a microbenchmark this is very fast

Python 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:13:24)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from tornado import gen

In [2]: from distributed.utils import iscoroutinefunction

In [3]: class A:
   ...:     async def method(self):
   ...:         await asyncio.sleep(0)
   ...:         return
   ...:
   ...:     @gen.coroutine
   ...:     def gen_coro(self):
   ...:         yield gen.moment()

In [4]: %timeit iscoroutinefunction(A.method)
434 ns ± 1.74 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [5]: %timeit iscoroutinefunction(A.gen_coro)
513 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [6]: import inspect

In [7]: %timeit inspect.iscoroutinefunction(A.method)
395 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@mrocklin
Copy link
Member

Fine by me. If we need it in the future we can cache by function name or something else. Happy to wait until it's proven necessary though and proceed optimistically.

@graingert
Copy link
Member

I was hoping this would also avoid: https://github.com/dask/distributed/runs/5661844951?check_suite_focus=true#step:12:1348

--- Logging error ---
Traceback (most recent call last):
  File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/logging/__init__.py", line 1086, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/work/distributed/distributed/distributed/utils_perf.py", line 199, in _gc_callback
    logger.warning(
Message: 'full garbage collections took %d%% CPU time recently (threshold: %d%%)'
Arguments: (36.035444889464394, 10.0)

@github-actions
Copy link
Contributor

Unit Test Results

       12 files  +       12         12 suites  +12   6h 0m 54s ⏱️ + 6h 0m 54s
  2 669 tests +  2 669    2 587 ✔️ +  2 587    81 💤 +  81  0 ±0  1 🔥 +1 
15 926 runs  +15 926  15 066 ✔️ +15 066  858 💤 +858  1 +1  1 🔥 +1 

For more details on these errors, see this check.

Results for commit 3ef2f5c. ± Comparison against base commit b872d45.

@fjetter fjetter merged commit 9d3064a into dask:main Mar 24, 2022
@seanlaw
Copy link

seanlaw commented May 3, 2022

I was hoping this would also avoid: https://github.com/dask/distributed/runs/5661844951?check_suite_focus=true#step:12:1348

--- Logging error ---
Traceback (most recent call last):
  File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/logging/__init__.py", line 1086, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/work/distributed/distributed/distributed/utils_perf.py", line 199, in _gc_callback
    logger.warning(
Message: 'full garbage collections took %d%% CPU time recently (threshold: %d%%)'
Arguments: (36.035444889464394, 10.0)

@graingert I just came across this as well. Do you know if this is something to be concerned with or is there another issue that references this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the "distributed.worker" logger gains ~thousand DequeHandler instances during a pytest run
4 participants