bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() #13971

asvetlov · 2019-06-11T10:54:11Z

If internal tasks weak set is changed by another thread during iteration.

https://bugs.python.org/issue36607

If internal tasks weak set is changed by another thread during iteration.

1st1 · 2019-06-11T12:55:08Z

Lib/asyncio/tasks.py

+        try:
+            tasks = list(_all_tasks)
+        except RuntimeError:
+            pass


I don't like unchecked loops like that. There must be a retry count or something.

The number of iterations is the question.
What is good enough to prevent a collision? 10? 100? 1000?
I think the number depends on count of threads and a number of tasks inside every processed loop.

I'd say 1000 should be enough. It's high enough to give it enough attempts and it's low enough to abort early if something weird is going on.

bedevere-bot · 2019-06-11T12:55:17Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

1st1 · 2019-06-11T13:20:08Z

Lib/asyncio/tasks.py

@@ -42,9 +42,19 @@ def all_tasks(loop=None):
    """Return a set of all tasks for the loop."""
    if loop is None:
        loop = events.get_running_loop()
-    # NB: set(_all_tasks) is required to protect
+    # NB: list(_all_tasks) is required to protect
    # from https://bugs.python.org/issue34970 bug


Can we write a proper comment here without referencing an issue on bpo? I mean a url is fine, but it requires the reader to navigate to it in order to understand what's going on.

1st1 · 2019-06-11T13:27:19Z

Lib/asyncio/tasks.py

    # from https://bugs.python.org/issue34970 bug
-    return {t for t in list(_all_tasks)
+    # NB: Have to repeat on RuntimeError, other thread


There's no point in prefixing all comments with "NB"; it's just a regular comment. I'd suggest to write something along the following (as it took me sometime to understand why this change is necessary at all):

"Looping over a WeakSet (_all_tasks) isn't safe as it can be updated from another thread while we do so. Therefore we cast it to list prior to filtering. The list cast itself requires iteration, so we repeat it several times ignoring RuntimeErrors (which are not very likely to occur). See issues 34970 and 36607 for details."

miss-islington · 2019-06-11T15:27:33Z

Thanks @asvetlov for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

…onGH-13971) If internal tasks weak set is changed by another thread during iteration. https://bugs.python.org/issue36607 (cherry picked from commit 65aa64f) Co-authored-by: Andrew Svetlov <[email protected]>

bedevere-bot · 2019-06-11T15:27:55Z

GH-13975 is a backport of this pull request to the 3.8 branch.

…onGH-13971) If internal tasks weak set is changed by another thread during iteration. https://bugs.python.org/issue36607 (cherry picked from commit 65aa64f) Co-authored-by: Andrew Svetlov <[email protected]>

bedevere-bot · 2019-06-11T15:28:03Z

GH-13976 is a backport of this pull request to the 3.7 branch.

nickdavies · 2019-06-11T15:59:28Z

This approach feels very wrong to me as we aren't fixing the underlying problem of multiple threads mutating the same global variable without locks.

Some more concrete issues:

We are trusting that RuntimeError is thrown in all cases where this non-threadsafe mutation occurs. A simple example of something removing (eg GC) and something adding (eg another thread) will result in no RuntimeError and instead you could get some weird invalid behavior.
Weakset/refs are not threadsafe and we are really lucky that given the current code it happens to crash in a consistent way. The code to detect size changes in iteration is not a valid approach to detecting this issue and I fear that this patch (which repeatedly tries the racy operation) could hide this issue and any changes to asyncio/weakref etc could change it from crashing to causing corruption or memory leaks or other strange behavior.
There is no theoretical limit to the number of times that this loop can run. If you have a really heavy asyncio workload (eg hundreds of thousands of tasks spread over many threads) it's possible for a sustained task adding rate to be similar to how long it takes to copy this list. It also makes this function perform at a very unpredictable speed, sometimes it copies the tasks list once, others it might copy it 10s or 100s of times.
If you do reach the limit that you have defined in the code then we are back to square one. A user can still write a perfectly valid program and have this error raised to them in which case they have no way of avoiding the problem.

In my opinion the only valid solution to this problem is to remove the possibility of two threads mutating the _all_tasks global. The cleanest way to do this is to move the _all_tasks variable to be a property on the loop itself instead of a model level variable

…3971) If internal tasks weak set is changed by another thread during iteration. https://bugs.python.org/issue36607 (cherry picked from commit 65aa64f) Co-authored-by: Andrew Svetlov <[email protected]>

1st1 · 2019-06-11T16:04:21Z

Using a lock is problematic because of GC.

Making all_tasks a loop property is a good idea. Andrew?

nickdavies · 2019-06-11T16:07:20Z

Yeah locks like be either impossible or almost impossible to implement correctly to solve this. I mentioned this on the issue as a complexity of trying to solve this using locks.

... this was my concern too, the adding and removing from the WeakDict[AbstractEventLoop, WeakSet[Task]] for _all_tasks could still cause issues. Specifically the whole WeakSet class is not threadsafe so I would assume WeakDict is the same, there may not be a nice way of ensuring a combination of GC + the IterationGuard doesn't come and mess up the dict even if I wrap it in a threading lock.

1st1 · 2019-06-11T16:27:55Z

One downside to make this a loop attribute is that the attribute becomes part of the AbstractEventLoop api. Other implementations (uvloop, tokio) will need to implement it. Another option is to add an indirection: another mapping of loop -> all_tasks

asvetlov · 2019-06-11T16:33:41Z

Making all_tasks a loop property is a good idea.

Ideally yes.
I have no strong objection but 2 comments:

It adds a new public API. All existing loop implementations are not forward-compatible with the change. E.g. you should release uvloop instantly to work with new asyncio version.
Since this is a new API the change is not acceptable for 3.7 at least. Maybe you can convince @ambv but I prefer reserving one more thing for threaded child watchers :)

To protect the current try-again approach: yes, theoretically it still can fail, the PR just reduces a chance of failure by several orders of magnitude.
WeakSet is safe in terms that the structure never crashes in Python even used by several threads in parallel (like dict and any other Python builtin collection).
It can return wrong data (extra or missing elements), yes. But the same problem exists if you analyze the tasks list just after getting the function value.

Suppose we have a code:

def all_tasks(loop=None):
    if loop is None:
        loop = events.get_running_loop()
    with _lock:
        tasks = list(_all_tasks)
   return {t for t in tasks if futures._get_loop(t) is loop and not t.done()}

Local tasks variable is a snapshot, not synchronized (protected by lock) value.
Moreover, the next line filters tasks from foreign loops.
It is very safe is passed loop is None (the thread that executes the function is the same thread that executes the loop), so before next yield point all_tasks() returned value is actual.

If loop points on a loop that is executed in a different thread then there is a chance of getting returned value out-of-sync with the actual state. Not sure if we can (and should) handle it reliable.

1st1 · 2019-06-11T16:44:39Z

Re adding new API: I think I addressed that in my last comment :)

#13971 (comment)

asvetlov · 2019-06-11T17:24:34Z

Yes. Sorry, I read your comment just after sending mine

nickdavies · 2019-06-11T17:37:37Z

If loop points on a loop that is executed in a different thread then there is a chance of getting returned value out-of-sync with the actual state. Not sure if we can (and should) handle it reliable.

I agree that we shouldn't try and support this because using loops across threads is not supported already.

It can return wrong data (extra or missing elements), yes. But the same problem exists if you analyze the tasks list just after getting the function value.

Correct me if I am wrong as I am not very familiar with the C implementations but I feel like this type of invalid data is very different from the time of copy vs time of use problem you are describing:

thread safe copy (perfect copy at the time): You are only going to miss new things added to your loop or completed tasks being removed. This can only happen if you are using loops across threads (eg asking for tasks for a loop that is running on a different thread) which isn't supported.
non-thread safe copy (currently possible): depending on how the code underneath is implemented you may return two copies of the same task or have valid tasks missing. This could result in a user that is doing the right have tasks that are never cancelled for example.

nickdavies · 2019-06-11T17:58:28Z

For reference we solved this in our backported version with this hack: https://gist.github.com/nickdavies/4a37c6cd9dcc7041fddd2d2a81cee383

the key difference is the _patch_loop that hacks a way to keep the list of tasks local to the thread that created the loop.

asvetlov · 2019-06-11T18:08:29Z

I agree that we shouldn't try and support this because using loops across threads is not supported already.

We do support it. asyncio.all_tasks(loop) handles the case when loop.run_until_complete() is called from non-current thread.

you may return two copies of the same task or have valid tasks missing

Two copies are impossible IMHO. Anyway, we make a set again before returning from all_tasks() function.

Valid tasks miss is possible if all_tasks() is called explicitly with loop which is executed by another thread. This is not a case for asyncio.run().

CPython switches threads in approximately every 5 ms (see sys.getswitchinterval()). 1000 retries is overkilling a little, the second retry should succeed. But keeping 1000 is safe and easy, that's why we use this (much higher than needed) value.

So, another implementation can save second (rare enough) list(_all_tasks) call for usages like asyncio.run(), which is not hot-path. We can allow a retry on loop closing/finalizing.

banool · 2019-06-11T19:43:32Z

I just don't understand how having code in the core CPython implementation that is correct most of the time is at all acceptable. It doesn't matter how unlikely it is, I expect the language itself to be correct 100% of the time. We should really aim for a deterministic fix.

1st1 · 2019-06-11T19:54:08Z

CPython switches threads in approximately every 5 ms (see sys.getswitchinterval()). 1000 retries is overkilling a little, the second retry should succeed. But keeping 1000 is safe and easy, that's why we use this (much higher than needed) value.

I think we should revert the change and implement it via the loop->all_tasks mapping. Or just revert it and don't fix if you have no time.

asvetlov · 2019-06-11T20:24:13Z

@1st1 and I agree that the PR is not perfect.
It is just a fire-fighting job that can be done quickly and landed to Python 3.7 bugfix release

https://bugs.python.org/issue36607 is reverted to "open" state.

Please feel to propose the PR which will be awesome!
loop->all_tasks mapping sounds like a good idea.

1st1 · 2019-06-11T20:27:35Z

It is just a fire-fighting job that can be done quickly and landed to Python 3.7 bugfix release

Yeah, I'm OK with this fix for 3.7. The main reason is that this allows us to not touch a bunch of low-level C code in a bugfix release. For 3.8 we'll do better.

…3971) If internal tasks weak set is changed by another thread during iteration. https://bugs.python.org/issue36607 (cherry picked from commit 65aa64f) Co-authored-by: Andrew Svetlov <[email protected]>

asvetlov · 2019-06-12T18:37:22Z

@nickdavies your patch still has a low chance of failure.
a) If task objects are referenced by other python objects
b) these objects have cross-references and need GC run to be destructed
c) GC is called on iterating on weakset
d) the iteration raises RuntimeError still

A probability of failure is very-very low but still not zero

…onGH-13971) If internal tasks weak set is changed by another thread during iteration. https://bugs.python.org/issue36607

Eliminate RuntimeError raised by asyncio.all_tasks()

e7f28d9

If internal tasks weak set is changed by another thread during iteration.

asvetlov added topic-asyncio needs backport to 3.7 labels Jun 11, 2019

asvetlov requested a review from 1st1 as a code owner June 11, 2019 10:54

the-knights-who-say-ni added the CLA signed label Jun 11, 2019

bedevere-bot added the awaiting core review label Jun 11, 2019

asvetlov changed the title ~~Eliminate RuntimeError raised by asyncio.all_tasks()~~ bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() Jun 11, 2019

1st1 requested changes Jun 11, 2019

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting core review labels Jun 11, 2019

1st1 reviewed Jun 11, 2019

View reviewed changes

Address comments

eb3d2a3

1st1 approved these changes Jun 11, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Jun 11, 2019

asvetlov added the 🤖 automerge label Jun 11, 2019

miss-islington merged commit 65aa64f into python:master Jun 11, 2019

bedevere-bot removed the awaiting merge label Jun 11, 2019

asvetlov deleted the protect-all_tasks branch June 11, 2019 15:27

bedevere-bot removed the needs backport to 3.8 label Jun 11, 2019

bedevere-bot removed the needs backport to 3.7 label Jun 11, 2019

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020

bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (pyth…

539562f

…onGH-13971) If internal tasks weak set is changed by another thread during iteration. https://bugs.python.org/issue36607

NickDavies mannequin mentioned this pull request Apr 10, 2022

asyncio.all_tasks() crashes if asyncio is used in multiple threads #80788

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() #13971

bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() #13971

asvetlov commented Jun 11, 2019 •

edited by bedevere-bot

Loading

1st1 Jun 11, 2019

asvetlov Jun 11, 2019

1st1 Jun 11, 2019

bedevere-bot commented Jun 11, 2019

1st1 Jun 11, 2019

1st1 Jun 11, 2019

miss-islington commented Jun 11, 2019

bedevere-bot commented Jun 11, 2019

bedevere-bot commented Jun 11, 2019

nickdavies commented Jun 11, 2019

1st1 commented Jun 11, 2019

nickdavies commented Jun 11, 2019

1st1 commented Jun 11, 2019

asvetlov commented Jun 11, 2019

1st1 commented Jun 11, 2019

asvetlov commented Jun 11, 2019

nickdavies commented Jun 11, 2019

nickdavies commented Jun 11, 2019

asvetlov commented Jun 11, 2019

banool commented Jun 11, 2019 •

edited

Loading

1st1 commented Jun 11, 2019

asvetlov commented Jun 11, 2019 •

edited

Loading

1st1 commented Jun 11, 2019

asvetlov commented Jun 12, 2019

bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() #13971

bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() #13971

Conversation

asvetlov commented Jun 11, 2019 • edited by bedevere-bot Loading

1st1 Jun 11, 2019

Choose a reason for hiding this comment

asvetlov Jun 11, 2019

Choose a reason for hiding this comment

1st1 Jun 11, 2019

Choose a reason for hiding this comment

bedevere-bot commented Jun 11, 2019

1st1 Jun 11, 2019

Choose a reason for hiding this comment

1st1 Jun 11, 2019

Choose a reason for hiding this comment

miss-islington commented Jun 11, 2019

bedevere-bot commented Jun 11, 2019

bedevere-bot commented Jun 11, 2019

nickdavies commented Jun 11, 2019

1st1 commented Jun 11, 2019

nickdavies commented Jun 11, 2019

1st1 commented Jun 11, 2019

asvetlov commented Jun 11, 2019

1st1 commented Jun 11, 2019

asvetlov commented Jun 11, 2019

nickdavies commented Jun 11, 2019

nickdavies commented Jun 11, 2019

asvetlov commented Jun 11, 2019

banool commented Jun 11, 2019 • edited Loading

1st1 commented Jun 11, 2019

asvetlov commented Jun 11, 2019 • edited Loading

1st1 commented Jun 11, 2019

asvetlov commented Jun 12, 2019

asvetlov commented Jun 11, 2019 •

edited by bedevere-bot

Loading

banool commented Jun 11, 2019 •

edited

Loading

asvetlov commented Jun 11, 2019 •

edited

Loading