-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python memory leaks on exit with outstanding calls #7121
Comments
What's the reference cycle? |
Sorry, I mispoke. It is a hanging reference. When we put an operation on the wire, we manually call |
I think I'm still not understanding "When python exits with outstanding calls" - doesn't the use of a cleanup thread inside the channel mean that calls will be cancelled when that thread is joined? Are we seeing this leak-on-exit in the output of our unit tests? If so, which ones? If not, what would a unit test to exercise this defect look like? Or a sample user story, if a unit test would be too weird? How certain are we that leak-on-exit-and-spew-some-logging is a GA blocker? To me it sounds like a minor annoyance that would be merely nice to fix. |
Yes, the _exit_tests.py show this behavior. gcloud + gax feels like a tier 1 use case to me. I'm amenable to pushing this off the GA milestone if you disagree. |
I'm still not understanding the defect itself - why doesn't the call get cancelled? It's a managed call, so why doesn't the cleanup thread cancel it? You mention a |
Sorry, to clarify this is for synchronous calls. They cannot use the cleanup thread. |
Synchronous response-unary calls? |
Yes. |
... and what's the interpreter exit mechanism? Ctrl-C at the keyboard, or a signal, or something the application initiates? |
Ctrl-C at the keyboard. In the tests it is simulated as a SIGINT sent to the process, but the intent is for Ctrl-C interpreter exit. I haven't tested calling |
I'm just not making the connections I need to make. I'm not getting "the operation tag which holds a reference to the call", because for synchronous (blocking) response-unary calls we use |
The operation tag reference is done at the Cython layer: |
This issue could probably be solved by removing only the Cython reference, and leaving our reference to managed calls to be cleaned up by the CleanupThread. I'd like to move in the direction of less state tracking because of possible performance gains, but I'd be fine with making a minimal change to fix the issue pre-GA and revisiting un-needed state tracking post-GA |
I'd originally added in the tag-as-GC-root thing to attempt to ensure memory safety whether or not things went wrong in the Python layers (attempting to ensure no use-after frees from Python). That said, I can't immediately come up with a situation where removing that reference could threaten that, so, I think I'm (tentatively) fine with that reference going away. |
Although... regardless of that reference being there (or not), I'm feeling like a more proper solution would be pumping all queues on exit... |
AFAICT, its only safe to pump a queue after shutting down (so you know when to stop), and that should only be done on completion queue destruction. The issue there is that calls have references to their completion queues: I actually played around with removing that reference to do what you are suggesting #7122 As it turns out, it is possible for the completion queue to go out of scope before the calls, and if that happens, nothing cancels the calls, and then the completion queue hangs trying to shut down, so I abandoned that idea. |
As I read and reread this issue, I get the sense that the invariant that's being violated is "calls that have operations started on them either get those operations finished or get cancelled", and that it's being violated just in the blocking sections of |
I don't believe that will work. We still need to get the call off the completion queue by calling As far as #7090, the issue is much harder to address. #7090 is a result of a cancellation before connection. An AVL tree is set up with "possible" connections, and gets destroyed when one of the connections is actually used. When |
@kpayson64 I'd be glad to try it. Given that the repository is not Python-only, can you suggest a |
@tseaver If that doesn't work you could try,
|
Quick note:
|
I've verified that this is fixed with the helloworld example. |
Has this change been released? If not, is there an ETA? I'd also prefer to use pip and whatnot. |
When python exits with outstanding calls, a hanging reference at the Cython level prevents proper cleanup, leading to a memory leak warning.
The text was updated successfully, but these errors were encountered: