-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-117657: Fix TSAN list set failure #118260
Conversation
Include/cpython/listobject.h
Outdated
@@ -44,7 +44,11 @@ PyList_SET_ITEM(PyObject *op, Py_ssize_t index, PyObject *value) { | |||
PyListObject *list = _PyList_CAST(op); | |||
assert(0 <= index); | |||
assert(index < list->allocated); | |||
#ifdef Py_GIL_DISABLED | |||
_Py_atomic_store_ptr_release(&list->ob_item[index], value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel right to me. Basically, PyList_SET_ITEM
is only safe in certain circumstances during list initialization before the list is exposed to other threads. If the list is later shared with other threads, it seems like there should be other synchronization involved to make the whole thing thread-safe.
Is there a reproducer for the Tsan reported race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I can hit it again, I didn't save the output :(. I hit it while following the steps in #117657
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is: https://gist.github.com/DinoV/4baac3cdfeace53a296aaee5d2a86996
So the lock is held, but the problem is that the lock doesn't get taken on the read. But I think a relaxed store would be good enough here. I'll continue running with that and see if there's any other races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined to make the change in _PyList_AppendTakeRefListResize
rather than PyList_SET_ITEM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a targetted test that repros it consistently under a TSAN build and we only need a relaxed load here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but I still think it makes more sense to modify _PyList_AppendTakeRefListResize
to use FT_ATOMIC_STORE_PTR_RELEASE
(or relaxed?) directly rather than modifying PyList_SET_ITEM
.
PyList_SET_ITEM
is used in ~2 places internally (_PyList_AppendTakeRef
and _PyList_AppendTakeRefListResize
) to modify existing lists while holding the lock. It's also used in many places to initialize newly created lists. The atomic operations make sense in the _PyList_AppendTakeRef*
calls, but not really for the other uses of PyList_SET_ITEM
.
I'm also not sure "relaxed" is strong enough for _PyList_AppendTakeRef*
-- we generally use "release" when assigning to the entries of ob_item
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the other places we're missing an atomic store is in list_extend_fast
but there seems to be some actual real crashes when we're extending and iterating at the same time that I'm looking into.
5916897
to
9298103
Compare
Include/cpython/listobject.h
Outdated
#ifdef Py_GIL_DISABLED | ||
_Py_atomic_store_ptr_relaxed(&list->ob_item[index], value); | ||
#else | ||
list->ob_item[index] = value; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the change to PyList_SET_ITEM
be reverted now that the problematic call sites use FT_ATOMIC_STORE_PTR_RELEASE
?
* Fix TSAN list set failure * Relaxed atomic is sufficient, add targetted test * More list * Remove atomic assign in list * Fixup white space
There's a failure in TSAN when we write to a list via
PyList_SET_ITEM
. There could be further problems using this inline function but callers need to either own the list locally or lock, but the reads can still proceed lock-free so this needs to be atomic. And it needs to be release semantics so that any values that are being published through the list will be visible.