-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio doesn't warn if a task is destroyed during its execution #65362
Comments
Some tasks created via asyncio are vanishing because there is no reference to their resultant futures. This behaviour does not occur in Python 3.3.3 with asyncio-0.4.1. Also, doing a gc.collect() immediately after creating the tasks seems to fix the problem. Attachment also available at https://gist.github.com/richardkiss/9988156 |
Ouch. That example is very obfuscated -- I fail to understand what it is trying to accomplish. Running it I see that it always prints 100 for the count with 3.3 or DO_CG on; for me it prints 87 with 3.4 an DO_GC off. But I wouldn't be surprised if the reason calling do.collect() "fixes" whatever issue you have is that it causes there not to be any further collections until the next cycle through that main loop, and everything simply runs before being collected. But that's just a theory based on minimal understanding of the example. I'm not sure that tasks are *supposed* to stay alive when there are no references to them. It seems that once they make it past a certain point they keep themselves alive. One more thing: using a try/except I see that the "lost" consumers all get a GeneratorExit on their first iteration. You might want to look into this. (Sorry, gotta run, wanted to dump this.) |
I agree it's confusing and I apologize for that. Background: This multiplexing pattern is used in pycoinnet, a bitcoin client I'm developing at <https://github.com/richardkiss/pycoinnet\>. The BitcoinPeerProtocol class multiplexes protocol messages into multiple asyncio.Queue objects so each interested listener can react. An example listener is in pycoinnet.helpers.standards.install_pong_manager, which looks for "ping" messages and sends "pong" responses. When the peer disconnects, the pong manager sees a None (to indicate EOF), and it exits. The return value is uninteresting, so no reference to the Task is kept. My client is in late alpha, and mostly works, but when I tried it on Python 3.4.0, it stopped working and I narrowed it down to this. I'm not certain this behaviour is incorrect, but it's definitely different from 3.3.3, and it seems odd that a GC cycle BEFORE additional references can be made would allow it to work. |
Most likely your program is simply relying on undefined behavior and the right way to fix it is to keep strong references to all tasks until they self-destruct. |
I'll investigate further. |
You were right: adding a strong reference to each Task seems to have solved the original problem in pycoinnet. I see that the reference to the global lists of asyncio.tasks is a weakset, so it's necessary to keep a strong reference myself. This does seem a little surprising. It can make it trickier to create a task that is only important in its side effect. Compare to threaded programming: unreferenced threads are never collected. For example: f = asyncio.Task(some_coroutine())
f.add_callback(some_completion_callback)
f = None In theory, the "some_coroutine" task can be collected, preventing "some_completion_callback" from ever being called. While technically correct, it does seem surprising. (I couldn't get this to work in a simple example, although I did get it to work in a complicated example.) Some change between 3.3 and 3.4 means garbage collection is much more aggressive at collecting up unreferenced tasks, which means broken code, like mine, that worked in 3.3 fails in 3.4. This may trip up other early adopters of tulip. Maybe adding a "do_not_collect=True" flag to asyncio.async or asyncio.Task, which would keep a strong reference by throwing it into a singleton set (removing it as a future callback) would bring attention to this subtle issue. Or displaying a warning in debug mode when a Task is garbage-collected before finishing. Thanks for your help. |
Thanks for understanding. It's definitely subtle: there is also some code in asyncio that logs an error when a Future holding an exception becomes unreachable before anyone has asked for it; this has been a valuable debugging tool, and it depends on *not* holding references to Futures. Regarding the difference between Python 3.3 and 3.4, I don't know the exact reason, but GC definitely gets more precise with each new revision, and there are also some differences in how exactly the debug feature I just mentioned is implemented (look for _tb_logger in asyncio/futures.py). OTOH it does seem a little odd to just GC a coroutine that hasn't exited yet, and I'm not 100% convinced there *isn't* a bug here. The more I think about it, the more I think that it's suspicious that it's always the *first* iteration that gets GC'ed. So I'd like to keep this open as a reminder. Finally, I'm not sure the analogy with threads holds -- a thread manages OS resources that really do have to be destroyed explicitly, but that's not so for tasks, and any cleanup you do need can be handled using try/finally. (In general drawing analogies between threads and asyncio tasks/coroutines is probably one of the less fruitful lines of thinking about the latter; there are more differences than similarities.) |
Ok, I agree that this issue is very tricky :-) The first problem in asyncio-gc-issue.py is that the producer keeps *weak* references to Queue object, so the Queue objects are quickly destroyed, especially if gc.collect() is called explicitly. When "yield from queue.get()" is used in a task, the task is paused. The queue creates a Future object and the task registers its _wakeup() method into the Future object. When the queue object is destroyed, the internal future object (used by the get() method) is destroyed too. The last reference to the task was in this future object. As a consequence, the task is also destroyed. While there is a bug in asyncio-gc-issue.py, it's very tricky to understand it and I think that asyncio should help developers to detect such bugs. I propose attached patch which emits a warning if a task is destroyed whereas it is not done (its status is still PENDING). I wrote a unit test which is much simpler than asyncio-gc-issue.py. Read the test to understand the issue. I added many comments to explain the state. -- My patch was written for Python 3.4+: it adds a destructor to the Task class, and we cannot add a destructor in Future objects because these objects are likely to be part of reference cycles. See the following issue which proposes a fix: Using this fix for reference cycle, it may be possible to emit also the log in Tulip (Python 3.3). |
The more I use asyncio, the more I am convinced that the correct fix is to keep a strong reference to a pending task (perhaps in a set in the eventloop) until it starts. Without realizing it, I implicitly made this assumption when I began working on my asyncio project (a bitcoin node) in Python 3.3. I think it may be a common assumption for users. Ask around. I can say that it made the transition to Python 3.4 very puzzling. In several cases, I've needed to create a task where the side effects are important but the result is not. Sometimes this task is created in another task which may complete before its child task begins, which means there is no natural place to store a reference to this task. (Goofy workaround: wait for child to finish.) |
The problem is not the task, read again my message. The problem is that nobody holds a strong reference to the Queue, whereas the producer is supposed to fill this queue, and the task is waiting for it. I cannot make a suggestion how to fix your example, it depends on what you want to do.
Sorry, I don't understand the relation between this issue and the Python version (3.3 vs 3.4). Do you mean that Python 3.4 behaves differently? The garbage collection of Python 3.4 has been *improved*. Python 3.4 is able to break more reference cycles. If your program doesn't run anymore on Python 3.4, it means maybe that you rely on reference cycle, which sounds very silly.
I'm not sure that this is the same issue. If you think so, could you please write a short example showing the problem? |
Premature task garbage collection is indeed hard to debug. But at least, with your patch, one gets an exception and has a chance to track the bug down. So I'm +1 for the patch. As for having strong references to tasks: it may have its own downsides, such as hard to debug memory leaks. I'd rather prefer my program to crash and/or having your patch report me the problem, than to search for an obscure code that eats all server memory once a week. I think we need to collect more evidence that the problem is common & annoying, before making any decisions on this topic, as that's something that will be hard to revert. Hence I'm -1 for strong references. |
Patch looks good. Go ahead. |
I reread more carefully, and I am in agreement now that I better understand what's going on. Thanks for your patience. |
I commited my change in Tulip (78dc74d4e8e6), Python 3.4 and 3.5: changeset: 91359:978525270264 changeset: 91360:e1d81c32f13d |
The new check emits a lot of "Task was destroyed but it is pending!" messages when running test_asyncio. I keep the issue open to remember me that I have to fix them. |
New changeset 1088023d971c by Victor Stinner in branch '3.4': New changeset 7877aab90c61 by Victor Stinner in branch 'default': |
New changeset e9150fdf068a by Victor Stinner in branch '3.4': New changeset d92dc4462d26 by Victor Stinner in branch 'default': |
New changeset 4e4c6e2ed0c5 by Victor Stinner in branch '3.4': New changeset 24282c6f6019 by Victor Stinner in branch 'default': |
I fixed the first "Task was destroyed but it is pending!" messages when the fix was simple. Attached dont_log_pending.patch fixes remaining messages when running test_asyncio. I'm not sure yet that this patch is the best approach to fix the issue. Modified functions with example of related tests:
=> related test: test_run_until_complete_stopped() of test_events.py
=> related test: test_wait_errors() of test_tasks.py
=> related test: test_one_exception() of test_tasks.py
=> related test: test_baseexception_during_cancel() of test_tasks.py |
Hum, dont_log_pending.patch is not correct for wait(): wait() returns (done, pending), where pending is a set of pending tasks. So it's still possible that pending tasks are destroyed while they are not a still pending, after the end of wait(). The log should not be made quiet here. |
New changeset 13e78b9cf290 by Victor Stinner in branch '3.4': New changeset 2d0fa8f383c8 by Victor Stinner in branch 'default': |
New changeset f13cde63ca73 by Victor Stinner in branch '3.4': New changeset a67adfaf670b by Victor Stinner in branch 'default': |
New changeset 6d5a76214166 by Victor Stinner in branch '3.4': New changeset fbd3e9f635b6 by Victor Stinner in branch 'default': |
New changeset e4fe6706b7b4 by Victor Stinner in branch '3.4': New changeset a627b23f57d4 by Victor Stinner in branch 'default': |
Ok, I fixed the last warnings emitted in unit tests ran in debug mode. I close the issue. |
I finally wrapped my head around this. I wrote a (simpler) script to get a better picture. What happens When a consumer task is first istantiated, the loop holds a strong reference to it (_ready) Later on, as the loop starts, the consumer task is yielded and it waits on an unreachable future. The last strong ref to it is lost (loop._ready). It is not collected immediately because it just created a reference loop gc.collect() called *before* the tasks are ever run has the weird side effect of moving the automatic gc collection forward in time. gc.collect() called after all the consumers are waiting on the unreachable future reaps all consumer tasks as expected. No bug in garbage collection. Yielding from asyncio.sleep() prevents the consumers from being Summing up: Tasks that have no strong refs may be garbage collected unexpectedly or not at all, depending on which future they yield to. It is very difficult to debug and undestand why these tasks disappear. Side note: the patches submitted and merged in this issue do emit the relevant warnings when PYTHONASYNCIODEBUG is set. This is very useful. Proposed enhanchements
If you also think 1. or 2. are neeed, let me know and I'll try cook a patch. Sorry for the noise |
I'm all in favor of documenting that you must keep a strong reference to a On Mon, Aug 18, 2014 at 7:20 AM, Marco Paolini <[email protected]>
|
Asking the user to manage strong refs is just passing the potential If the user gets the strong refs wrong he can either lose tasks or If the standard library gets it right, both issues are avoided.
|
So you are changing your mind and withdrawing your option #1. I don't have the time to really dig deeply into the example app and what's I'll be on vacation most of this week. On Mon, Aug 18, 2014 at 9:17 AM, Marco Paolini <[email protected]>
|
Submitted a first stab at #2. Let me know what you think. If this works we'll have to remove the test_gc_pending test and then maybe even the code that now logs errors when a pending task is gc'ed |
I don't understand how keeping a strong refrence would fix anything. You I dislike the idea of keeping strong references to tasks, it may create The current unit test uses low level functions to remove a variable using a |
The original asyncio-gc-issue.py wasn't written by me, and yes, as you say it does have the reference bug you describe. I argue that bug shouldn't cause tasks to die: it should rather limit (as gc proceeds) the number of queues available to the producer in the WeakSet() and leaving alive all consumer waiting on an unreachable queue. Please look at my test2.py or even better test3.py for a simpler example. Note that in my issue_22163_patch_0.diff I only keep strong refs to futures a task is waiting on. Just as asyncio is already doing with asyncio.sleep() coroutine.
I dislike the idea of randomly losing tasks. I also dislike the idea of forcing the user to manage strong refs to its tasks. All 3rd party libraries will have to invent their own way and it will lead to even more leaks/cycles very hard to debug. Not just exceptions: everytime a task is yielding on a future asyncio creates a reference cycle.
My issue_22163_patch_0.diff only clears references by setting variables to My test2.py example script also doesn't use any low level stuff I just uploaded test3.py with a simpler (and possibly more realistic) example. |
Sorry for keeping this alive. Take a look at the Do you think my proposed patch is OK? Sould I open a new issue? |
I'm not sure how that wait_for.py example from bpo-2116 relates to this issue -- it seems to demonstrate the opposite problem (tasks are kept alive even though they are cancelled). Then again I admit I haven't looked deeply into the example (though I am sympathetic with the issue it purports to demonstrate). |
(Whoops meant to link to bpo-22448.) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: