-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve UB in Arc/Weak interaction #72479
Conversation
r? @dtolnay (rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(First of all let's cc @RalfJung for their UB expertise)
Hmmm, I think we can make fewer changes: I don't think that the definition for Arc
needs to be changed; AFAIK, the "only" / main offender here is a Weak
calling .inner()
while a unique Arc
is around, since that Arc
believes it is allowed to do whatever it wants with the data, when, because of that callable .inner()
(transitively yielding a &'_ T
shared reference to the data
) on a simultaneous Weak
instance, it cannot.
So I suggest keeping everything as it was, except for the signature of the returned value of .inner()
, which should be changed to return an Option<_>
of:
-
either
&'_ ArcInner<UnsafeCell<MaybeUninit<T>>
(theMaybeUninit
part setting things up for further evolutions, such as the one in your other PR), -
or, more simply,
&'_ ArcInner<1-ZST>
(which would make.inner()
effectively equivalent to a.counters()
getter returning astruct Counters<'inner> { strong: &'inner AtomicUsize, weak: &'inner AtomicUsize }
, which is fine since that's exactly what.inner()
should be doing before knowing thatstrong ≥ 1
!).
In both cases the core idea is that a Weak
gets to look at its counters, mainly the strong
one, before ever trying to assert anything about the .data
/ to peek at it. This way, a unique Arc
owner can effectively take ownership of the ArcInner
's .data
by clearing that .strong
counter, so as to no longer worry about remaining Weak
s.
Finally, remains the question of what the fields of Weak
ought to be. Given that the pointer may already be dangling, and that therefore the blessed way to dereference that pointer is through the guarded .inner()
getter, I'd say that we could let the field(s) remain as is. If, on the other hand, we really want to enforce / express at the type-level that using that getter is mandatory, we could change Weak
to become:
+ struct ErasedPayload;
+ type Counters = ArcInner<ErasedPayload>; // thanks to `#[repr(C)]`-ness of `ArcInner`
struct Weak<T : ?Sized> {
- ptr: NonNull<ArcInner<T>>,
+ ptr: NonNull<Counters>,
+ _phantom: PhantomData<ArcInner<T>>,
}
So |
Okay so I agree there is a race, e.g. when someone calls But IMO the "most obvious" solution to this is to make |
I think both of you are not wrong - there are other ways that may require fewer changes to make this sound. However, I chose this way because I think it's easier to reason about. Without the changes to ptr::drop_in_place(&mut self.ptr.as_mut().data); (specifically, as soon as this executes: From a soundness perspective, I think this is OK (although I'm not exactly confident) because we're only accessing the On the other hand, using an The only thing I don't like about this approach is the need to cast "ptr" back and forth to solve the variance issue: ideally we could just always use the |
@danielhenrymantilla |
There are very few situations when references are easier to reason about than raw pointers. You are doing some rather extreme hacks here with Raw pointers make this a lot simpler, and avoid all these hacks you need to "tame" shared references and avoid all the problems. Working with aliasing data is exactly what raw pointers are for. Sure, we could replace almost every
If EDIT: Oh wait, the mutable reference is only used for |
Yeah this is what I was trying to get at.
Consistently using raw pointers everywhere where "inner" is accessed would be a much larger change than this PR, and that style of programming is really not very ergonomic in Rust. Using it in some places but not others means you still have to worry about reference-related problems, so why not embrace shared references everywhere? I don't really see this as any different from an The only hacky part here as far as I'm concerned is the variance issue, which requires us to cast the pointer type back and forth, but this is encapsulated in the "inner()" method, so most of the code doesn't have to concern itself with that. |
I don't think changes everywhere would be needed. And I think selectively using raw pointers in what you identified to be the "critical" parts (dropping, where we need a mutable reference only to the data, and It is true that raw pointers are less ergonomic. To make them more ergonomic, we need to get a better understanding of the kind of patterns one would use them for. We'll never get that understanding if we turn ourselves upside down to avoid raw pointers wherever possible.
If that were true, we should make I maintain that using raw pointers would be a lot cleaner. If you find someone else to r+ this as-is, I won't block it, but I will not approve this change until raw pointers have at least been tried. |
These "critical parts" include "Arc::drop_slow", and every single usage of
I view this as a case where we are enforcing a property (covariance) through our safe interface, and the compiler simply has no way of knowing that. In most cases, we have special types and traits to express these properties ( unsafe impl<T> Covariant<T> for Arc<T> {}
unsafe impl<T> Covariant<T> for Weak<T> {} To allow variance to be overridden. If we had that syntax then we wouldn't need to do any casting back and forth at all.
I agree that this is not ideal, I just see it as a temporary evil that is fairly well encapsulated, and can be got rid of once we have some way to override variance. |
Correction: it hasn't been stabilized yet. I think libstd is just the right place to start using it.
I find it unlikely that we will have such a way in the near or medium-term future. Thanks a lot for preparing that other PR! I much prefer that approach. |
Resolve UB in Arc/Weak interaction (2) Use raw pointers to avoid making any assertions about the data field. Follow up from rust-lang#72479, see that PR for more detail on the motivation. @RalfJung I was able to avoid a lot of the changes to `Weak`, by making a helper type (`WeakInner`) - because of auto-deref and because the fields have the same name, the rest of the code continues to compile.
Resolve UB in Arc/Weak interaction (2) Use raw pointers to avoid making any assertions about the data field. Follow up from rust-lang#72479, see that PR for more detail on the motivation. @RalfJung I was able to avoid a lot of the changes to `Weak`, by making a helper type (`WeakInner`) - because of auto-deref and because the fields have the same name, the rest of the code continues to compile.
Resolve UB in Arc/Weak interaction (2) Use raw pointers to avoid making any assertions about the data field. Follow up from rust-lang#72479, see that PR for more detail on the motivation. @RalfJung I was able to avoid a lot of the changes to `Weak`, by making a helper type (`WeakInner`) - because of auto-deref and because the fields have the same name, the rest of the code continues to compile.
There's UB in the current Arc implementation, see:
rust/src/liballoc/sync.rs
Lines 866 to 870 in c60b675
This races with
Weak::inner
:rust/src/liballoc/sync.rs
Lines 1680 to 1683 in c60b675
Calling
drop_in_place
mutates thedata
value, which AIUI is UB when elsewhere we have a live shared reference to the containing struct (unless we mark the field as internally mutable somehow).The implementation for this was a little tricky: we can't just switch to using an
UnsafeCell<ManuallyDrop<T>>
as this changes the variance ofArc
andWeak
from covariant to invariant, so instead we make sure to never dereference the stored pointer: we always cast it toArcInner
before dereferencing.Follow-up from #72443
cc @danielhenrymantilla