-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio: Use strong references for free-flying tasks #91887
Comments
You can use the new |
It looks to me like it would be a trivial code change (keeping all active tasks in a separate, non-weak set) to gain a large benefit here. This behavior is inherited from Futures (where it makes sense). If a future is not being kept track of, it's a bug (the developer forgot to await). However, tasks start executing, if not immediately, on their own, so not awaiting a task is not as obviously a mistake. Interrupting actively running code because someone didn't maintain a reference (on the gc's terms and timing) is a very different beast than not executing code that the desire and timing of which is unclear (and an exception can be somewhat promptly triggered for). I'll have time this evening to open a PR, if nobody else wants to get to it. |
For impact perspective, discord.py drops the task after calling create_task on it. This means (effectively) every python based discord bot is affected by this, and every event dispatch appears to be technically at the whims of the GC. |
Rather than rhetoric about impact or how trivial the fix would be, we need a discussion on why we designed tasks this way in the first place. It doesn't look to me like it was inherited from Futures -- the variable is called That said, I'm not sure why we designed it this way, and (from private email) @1st1 doesn't seem to recall either. But it was definitely designed with some purpose in mind -- the code is quite complex and was updated every time an issue with it was discovered, and we've had many opportunities to change this behavior but instead chose to update the docs (gh-29163), even when asyncio itself suffered (gh-90467). I also don't understand why the dict of all tasks is a global, since the only API that uses this ( FWIW if we simply make |
Agree that this should be approved by someone who is familiar with the code's design objectives, which is why I brought up two alternatives and did not create a PR right away. |
Sorry, I've been doing more reading into what the current state is and other related issues since I wrote that comment and it is absolutely a lot more detailed behind the scenes (like this comment here I only got to today: #80788 (comment) ) which mirrors some of what you've sent here. I was working with the idea that Tasks were inherited from Futures because that was how they were introduced in the original PEP-3156. I think I have found the origin of why it was made a weakset in the first place: https://groups.google.com/g/python-tulip/c/13hfgbKrIyY/m/HpwWPGHKT6IJ Specifically, this patch: So it was originally intended to be a registry to help recover stuck tasks and get stack frames for? |
(Sorry, I hit some wrong buttons.)
But apparently at that point it was already the case that tasks had to be kept alive by their owner -- ISTM a weakset was used specifically to avoid keeping tasks alive longer than necessary. So this has always been part of the design, it just wasn't made explicit in the docs. I'm still curious why we originally designed it that way. It's possible that we never consciously realized this constraint. It's also possible that, since we did make Task a subclass of Future, we assumed that tasks would always be awaited. I am still not convinced, despite this being a common stumbling point, that we can fix this without consequences for use code. |
#65362 made this make a little more sense to me. Tasks can't be dropped if they're actively executing or scheduled to execute (_ready) because there's strong references held by the event loop, and there's a comment that notes a bit about this at the start of the Task class, although I'm not sure if that's intended to apply to execution only or for reference management. Tasks only appear to be GCable if they're blocking on another future which ends up being GCable itself. If the blocking future is _ready/executing, then it, also, is kept alive through the above invariant, and there's no chance of losing the dependent task. It makes sense to error and garbage collect if the chain is broken, because we've got a "proven" memory leak (the task cannot be woken again if it's dependent future is unreachable). |
Yes, it is possible to fix this by making sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. But I have a feeling that this would be a somewhat fragile approach, both considering future changes in the stdlib and futures that a user creates. A year ago I Haven't thought of it like causing a provable memory leak, though. Good point. |
So how about adding a keyword argument to |
It was already the case that tasks had to be kept alive by their owner, but I'm not sure this was understood to be the case. If you look back at the diff that @bast0006 found, there were additional methods
It could just be a historical accident. IIRC, early on the event loop didn't have any knowledge of tasks, so
I appreciate your caution here, but I'll point out that the docs for Arguments for having
Arguments against:
The worst-case scenario that I can see if we make this change would be in a program that calls |
I would be interested in seeing "all_tasks" going from a state-global weakset of "alive tasks" to a per-loop strong-set of pending tasks. interestingly (maybe), my motivation is not enabling free-flying tasks - it is performance! some observations from trying to dig through the evolution of all_tasks from tulip to today:
so... considering there's some convoluted history behind all_tasks in general, and it being a WeakSet in particular, and changing this would make some users happy (free-flying tasks! perf!) and others unhappy (backwards incompatible change!) -- is there a realistic path to changing this? |
Give it a try, but aim for 3.13. (Just sitting on the PR for 2.5 weeks should do it.) |
The example in that test can be reduced to:
With a strong-set this will be kept alive, but with a weak-set it will be GC'd. I find this pretty unintuitive but it happens because there are no references to the In my mind, anyone writing code like this deserves a memory leak. They might even want a leak as this might be one way of spotting a programming error here. |
@itamaro Your comment seems similar to what I have planned in #80788 (comment) to do in 3.13. |
One way to avoid expanding the AbstractEventLoop interface would be to make
The Future will never resolve, but there is still one way the Task can be rescheduled: if someone cancels the task via the reference in
I've made the same argument in the other direction: you might want the "task was destroyed but it is pending" log message because it's a more legible indication of a problem than a memory leak (which needs to either be large or occur frequently to be noticeable). But I think that overall, if someone writes a task that waits forever, they should get a task that lives forever (with the memory consumption that implies). |
Ah! but this is less likely to happen than you might think. Consider this runnable example:
This actually completes silently because the manually created task ends up being cancelled. |
But this will have the downside @itamaro and I want to avoid:
Interesting, I hadn't thought of cancelation to wake-up the task. Although overall this sounds like an even stronger argument for a strong-set as it will at least make things consistent.
@kumaraditya303 what you describe there sounds great if you're okay with using a strong-set. Is that the case? |
Per the warning in the asyncio documentation, we need to hold a strong reference to all asyncio Tasks to prevent premature GC. Following discussions in cpython (python/cpython#91887), we hold these references on the IOLoop instance to ensure that they are strongly held but do not cause leaks if the event loop itself is discarded. This is expected to fix all of the various "task was destroyed but it is pending" warnings that have been reported. The IOLoop._pending_tasks set is expected to become obsolete if corresponding changes are made to asyncio in Python 3.13. Fixes tornadoweb#3209 Fixes tornadoweb#3047 Fixes tornadoweb#2763 Some issues involve this warning as their most visible symptom, but have an underlying cause that should still be addressed. Updates tornadoweb#2914 Updates tornadoweb#2356
I can understand the concern, and I'm growing more open to the conclusion that some extra references to tasks is the lesser evil. I can be a bit concerned that this is a broader problem than just tasks, though, as this fundamentally has to do with how the event loop works rather than specifically tasks? Perhaps weak references should be viewed as something very dangerous and the barrier for accepting usage of them should be much higher? What I'd like to avoid is also throwing out the good effects the current system has. Specifically, that the test |
The discussion here so far has been very long term, future proofing. However, this test case is currently broken: import asyncio
import gc
async def background():
remote_read, remote_write = await asyncio.open_connection("example.com", 443, ssl=False)
await remote_read.read()
async def cleanup():
while True:
gc.collect()
await asyncio.sleep(3)
async def main():
asyncio.create_task(background())
await cleanup()
asyncio.run(main()) Should I open a new issue for that, or do we deal with it here? |
@CendioOssman If you have a solution in mind, you can submit a PR and link it to this issue in the PR title. |
I have an idea at least. I'll open a PR once I have something that seems to work. However, after looking more at this, the active task list unfortunately doesn't completely solve this. So I'm back towards preferring some ban on weak references. You can hit this issue even without coroutines: import asyncio
import functools
import gc
import weakref
class Proto:
def __init__(self, reader):
self.reader_wr = weakref.ref(reader)
asyncio.get_event_loop().call_later(20, self._wake_reader)
def _wake_reader(self):
self.reader_wr().wake()
class Reader:
def __init__(self):
self.proto = None
def set_proto(self, proto):
self.proto = proto
def wait(self):
self.waiter = asyncio.get_event_loop().create_future()
return self.waiter
def wake(self):
self.waiter.set_result(None)
class LogFunc:
def __init__(self):
self._done = False
def __del__(self):
if not self._done:
context = {
'message': 'Callback was destroyed but it is pending!',
}
asyncio.get_event_loop().call_exception_handler(context)
def __call__(self, *args, **kwargs):
self._done = True
print("Foo", args, kwargs)
def start():
reader = Reader()
proto = Proto(reader)
reader.set_proto(proto)
fut = reader.wait()
fut.add_done_callback(functools.partial(LogFunc(), reader, proto))
async def cleanup():
while True:
gc.collect()
await asyncio.sleep(3)
async def main():
start()
await cleanup()
asyncio.run(main()) The above example constructs the same weak reference structure as the streams API, but there are no tasks involved that could be used to anchor things. |
That example just shows what weakrefs do: they don't keep an object alive. That's a feature of weakrefs. If you want the object to stay alive, don't use a weakref. So what's the point of your example? |
Indeed. But the example uses the same design as So this is the type of code we are trying to make "safe" by providing a strong reference elsewhere. The point of the example was to show that the suggested fix of having a list of tasks will be insufficient in that goal. So something more would be needed. |
@CendioOssman I'm afraid I've lost track of what you're arguing. Likely you are in violent agreement with the other participants here. You speak of a ban on weak refs. Do you mean everywhere, or everywhere in asyncio, or in a specific place in asyncio? Or do you mean this as a general recommendation to asyncio users? And you claim that a list of active tasks isn't sufficient. I'd think that would be sufficient for preventing active tasks from being lost -- are there other things in asyncio that aren't kept alive? (Handles or callbacks?) |
@CendioOssman Thank you for the example in #91887 (comment), this is the simple non-Tornado example we're looking for. No weakrefs here (unless there are some in asyncio itself). I'm a little confused by your most recent example, but the first one works to illustrate the problem we're discussing in this issue. For those following along who haven't run the code, the example I linked will log a "Task was destroyed but it is pending" message for the So this is an illustration of the guideline in the docs to save a reference to the result of |
I agree with @gvanrossum that just because it's possible to construct a scenario where a weak reference's object is destroyed, that doesn't necessarily mean that that's a problem. As I see it, the entire idea behind garbage collection is that objects, that are no longer reachable by user code, can be destructed. Since user code is executed by tasks in asyncio, the natural assumption is that strong references are held from the side that executes user code, i.e. by tasks. Apparently the asyncio standard library also was designed with this intuition in mind. Introducing a |
Is there still an action item here? If not, I can close it (either as "won't fix" or "fixed" depending on the mood). |
Yes, there's still an issue here. There's some discussion of weak references and whether we should care, but that's kind of a distraction and we have examples without weak references and only using core asyncio code. This minimal example was provided by @CendioOssman in #91887 (comment) and I'll repeat it here: import asyncio
import gc
async def background():
remote_read, remote_write = await asyncio.open_connection("example.com", 443, ssl=False)
await remote_read.read()
async def cleanup():
while True:
gc.collect()
await asyncio.sleep(3)
async def main():
asyncio.create_task(background())
await cleanup()
asyncio.run(main()) This is a background task which does not follow the docs' guidance to hold a strong reference to the result of If this were the only instance, perhaps we could say that asyncio is working as documented and nothing needs to change. But going back to the original message in this issue, I continue to think that a global |
I've been meaning to come back to this issue and provide some tangible change suggestions. So I would appreciate it if things could be kept open for a while longer. :) |
Let's call the example with classes So let's focus on the shorter example with coroutines Like Ben, I don't immediately see a downside to having a global set of pending tasks either, as long as tasks are guaranteed to remove themselves from it when they complete (and without the need for That doesn't mean there isn't a downside, but we'll never find out until we try. We'll probably have to ask @1st1 to think about this -- there might be a reason that involves uvloop or Edgestore. Possibly the fact that uvloop has its own task factory makes things more complicated -- we should definitely think about that some more. Perhaps we can use a done callback that unlinks the task when it completes? @CendioOssman, are you interested in coming up with a PR? Or do you continue to feel that the other example must also be fixed? In that case we may have to agree to disagree, and someone else can create a PR. |
Here's a concrete example of a hack to workaround this issue https://github.com/Falmarri/podman-compose/blob/8d8fa54855ce7eb73e802ef06e9f48645a30e2ac/podman_compose.py#L1229-L1237 It's possible there's a better way of structuring this, but IMO this should just be the default where I don't have to care about this. I can just start these tasks and not have to worry. |
I have a feeling we need a new champion to drive a PR here. I think a global or per-loop (non-weak) set of active tasks should solve the issue, but there are a bunch of details that need sorting through -- notably how we ensure that 3rd party tasks (e.g. from uvloop) are properly inserted into and removed from the set, without requiring changes to the 3rd party library. |
cc'ing @kumaraditya303, in relation to gh-104787 and #80788 (comment) - are you still planning to tackle this for 3.13? |
@itamaro @kumaraditya303 Unless one of you is actively working on a PR for this, I would recommend we move this item back to TO DO. Thoughts? |
Agreed. I haven't worked on this. |
Moving back to To Do for now. |
Flagging this issue for discussion at the core dev sprint unless there is a champion before the sprint. |
I have created a draft PR at #121264. In this PR, I have not made this feature optional. I'm open to adding the task to Potential for Memory LeaksWhenever a future's state transitions from the Use of
|
In #88831 @vincentbernat pointed out that CPython only keeps weak references in
_all_tasks
, so a reference to aTask
returned byloop.create_task
has to be kept to be sure the task will not be killed with a "Task was destroyed but it is pending!" at some random point in time.When shielding a task from cancellation with
await shield(something())
,something
continues to run when the containing coroutine is cancelled. As soon as that happens,something()
is free-flying, i.e. there's no reference from user code anymore.shield
itself has a bunch of circular strong references, but these shouldn't keep CPython from garbage-collecting the task. Hence, here the same problem occurs and the task might be killed unpredictably. Additionally, when running coroutines in parallel withgather
andreturn_exceptions=False
, an exception in one of the coroutines will leave remaining tasks free-flying. Also in this case, the remaining tasks might be killed unpredictably.Hence, a warning in the documentation for
create_task
unfortunately does not suffice to solve the problem. Additionally, it has been brought up in #88831 that an API for fire-and-forget tasks (i.e. when the user doesn't want to keep a reference) would be nice.As solution, I suggest to either
(1) introduce a further
_pending_tasks
set to keep strong references to all pending tasks. This would be the simplest solution also with respect to the API. In fact, a lot of dicussions on Stack Overflow (e.g., here, here, here) already rely on this behavior (throwing away the reference returned bycreate_task
), although it's wrong currently. Since the behaviour for free-flying tasks is unpredictable currently, it should not introduce any compatibility issues when making it predictable by preventing them from being garbage-collected.(2) make sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. A quick
grep
resulted in potential problems, e.g., here and here. This does not seem like a very robust approach, though.(3) introduce the concept of background tasks, i.e., tasks the user does not want to hold references to. The interface could look like suggested in #88831 (comment) . Tasks from
shield
andgather
could be automatically converted to such background tasks. Clearly, it would add complexity to the API, but the distinction between normal tasks and background tasks might potentially be beneficial also for other purposes. E.g., one might add an API call that waits for all background tasks to be completed.My preferred solution would be (1).
Linked PRs
The text was updated successfully, but these errors were encountered: