-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix infinite loop releasing the connection when the writer is not finished #7798
Conversation
…ished aio-libs#7764 ensured the connection was released if the writer was not done yet, but it never cleared the writer so it would schedule callbacks in a loop
aiohttp/client_reqrep.py
Outdated
self._writer.add_done_callback(lambda f: self._release_connection()) | ||
self._writer.add_done_callback( | ||
self._cleanup_writer_and_release_connection | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it a separate function to ensure any future refactoring never generates a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there definitely an issue here? The writer task itself does self._writer = None
, which is why I originally left this code like this. It might be worth making a change to make it clearer that the behaviour is correct, I'm just aware that we already have multiple blocks of code in this class doing almost identical things, so I was trying to avoid adding another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop is
_release_connection
is called and self._writer
is not None so it calls the self._writer.add_done_callback
, the writer finishes and the callback fires, so _release_connection
is called again, and since self._writer
was never unset on ClientResponse
it does the self._writer.add_done_callback
again and since its already done, it fires via call_soon
, and the loop repeats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it happened without intervention and I only found the issue because the container was using 100% cpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but why does the callback happen when the writer has not been reset? The task itself resets the attribute before completing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, just realised, that assignment in the init needs to trigger the callback logic. Just pushed a commit to ensure that happens. Can you try again with that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a shot as soon as I get back home < 1h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cherry-picked f45d9e4 cleanly, results shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came up cleanly, loop did not happen right away as before.
profile is clean, py-spy is clean
Will report back tomorrow as the original symptoms only happened after about 12 hours (not sure one which request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything ran fine overnight.
I rebooted a few switches and routers to generate some network chaos and everything recovered just fine
Closing this PR in favor of #7815
Codecov Report
@@ Coverage Diff @@
## master #7798 +/- ##
==========================================
- Coverage 97.42% 97.41% -0.01%
==========================================
Files 106 106
Lines 32111 32115 +4
Branches 3726 3725 -1
==========================================
+ Hits 31283 31286 +3
- Misses 626 627 +1
Partials 202 202
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
closing in favor of #7815 |
What do these changes do?
#7764 ensured the connection was released if the writer was not done yet, but it never cleared the writer so it would schedule callbacks in a loop
Are there changes in behavior for the user?
Related issue number
Checklist
CONTRIBUTORS.txt
CHANGES
folder<issue_id>.<type>
for example (588.bugfix)issue_id
change it to the pr id after creating the pr.feature
: Signifying a new feature..bugfix
: Signifying a bug fix..doc
: Signifying a documentation improvement..removal
: Signifying a deprecation or removal of public API..misc
: A ticket has been closed, but it is not of interest to users.