Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio Race Condition Leading to Infinite Loop #2001

Open
1 task done
TheTechromancer opened this issue Jul 26, 2024 · 1 comment
Open
1 task done

Asyncio Race Condition Leading to Infinite Loop #2001

TheTechromancer opened this issue Jul 26, 2024 · 1 comment

Comments

@TheTechromancer
Copy link

TheTechromancer commented Jul 26, 2024

This is a pyzmq bug

  • This is a pyzmq-specific bug, not an issue of zmq socket behavior. Don't worry if you're not sure! We'll figure it out together.

What pyzmq version?

26.0.3

What libzmq version?

4.3.5

Python version (and how it was installed)

Python 3.9 via apt

OS

Debian

What happened?

Recently I've run into a bug in cpython that directly affects ZMQ. It triggers whenever asyncio debugging is enabled, and the ZMQ future blocks for more than .1 second:

https://github.com/python/cpython/blob/7c2921844f9fa713f93152bf3a569812cee347a0/Lib/asyncio/base_events.py#L2021-L2023

The bug is due to an unintended recursion that happens when repr() is called on an asyncio task. The recursion is caused by ZMQ's future storing references to other futures including itself, which creates a circular reference. However, because each new layer of recursion must iterate over multiple futures, a RecursionError is never reached, and instead it results in a deadlock where the CPU is stuck at 100%:

image

This is mainly a bug in cpython, and was fixed in 3.11. However, 3.10 and earlier are still vulnerable to this bug, and based on the feedback from the cpython issue, the fix will not be back ported to those older versions.

python/cpython#122296

I'm creating this issue so you're aware of it, and so anyone else googling for the issue can find it. This one was a beast to track down, since it only happens when PYTHONASYNCIODEBUG=1 and when the ZMQ future blocks for more than .1 second. Hopefully it's helpful to someone.

Full traceback:
python_traceback.txt

Code to reproduce bug

import asyncio
import time
import functools

async def slow_callback():
    await asyncio.sleep(.1)
    time.sleep(.2)  # Blocking sleep to trigger the warning
    await asyncio.sleep(.1)

async def main():
    task = asyncio.create_task(slow_callback())
    task.add_done_callback(
        functools.partial(print, [task, task])
    )
    await task

if __name__ == "__main__":
    asyncio.run(main(), debug=True)

Traceback, if applicable

No response

More info

No response

@minrk
Copy link
Member

minrk commented Jul 29, 2024

Thanks for the report! I'm not sure there's an easy fix, but if one turns up I'll give it a try. Hopefully this will help others find out what's going on, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants