-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Arc::new_with()
function for constructing self-referential data structures
#72443
Conversation
r? @cramertj (rust_highfive has picked a reviewer for you, use r? to override) |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Hm, using an |
Do you have a reference for that? IIRC think the current code is fine. (There are other issues but those are unrelated.)
So, where is the shared reference that points to data while it is being mutated? As far as I can tell, all we have here are a bunch of raw pointers, in which case you wouldn't need |
@RalfJung yeah, for example, look at Lines 866 to 870 in c60b675
This races with Lines 1680 to 1683 in c60b675
Firstly, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reviewing with a few nits (and a potential unsoundness with atomic ordering), but then, after thinking about it, I've realised that the whole Weak
-as-input API is hard to make sound given how Weak
operates: anybody can call .upgrade()
on a Weak
or a .clone()
of it, which will call .inner()
.
And currently, .inner()
does promote the NonNull<ArcInner<T>>
to a &ArcInner<T>
, thus asserting both:
- non-aliasing of
data: T
(c.f., @Diggsey post above), - and in the future, that
data
is initialized. This incurs in the maintenance burden of having to be update the code if Rust "starts exploiting" that validity of the pointee is part of the validity invariant of a Rust reference (which it conservatively "officially" does: the very fact this currently isn't be a problem here is that the stdlib gets to know what isn't "exploited UB" yet).
So, instead, I suggest that, before this PR goes forward, Weak
's .inner()
be changed to return an Option<&'_ ArcInner<UnsafeCell<MaybeUninit<T>>>
(or an Option<&'_ ArcInner<()>>
with separate unsafe fn
s &data
accessors)
- Otherwise, the whole
new_with
does indeed require infecting everything withUnsafeCell
s, and would be relying on Rust references being allowed to refer to uninitialized data 😬
This may look cumbersome, but it will lead to the rest of Arc
's API to be at peace with Weak
s being, well, a very weak pointer type, that only gets to assert anything about its data
if and only if strong ≥ 1
.
Then, we could slightly simplify and thus improve new_with
, with something along these lines:
/// Constructs a new `Arc<T>` out of a closure that gets to have
/// `Weak` "back"-pointer to the to-be-returned `Arc`.
///
/// This is useful to create immutable graph-like structures, since
/// those may require back pointers.
///
/// Since the closure runs before such `Arc` is fully constructed,
/// the obtained `Weak` "dangles" while the closure is running:
/// any attempts to upgrade the `Weak` at that point will fail.
#[inline]
#[unstable(feature = "arc_new_with", issue = "none")]
pub fn new_with(data_fn: impl FnOnce(&'_ Weak<T>) -> T) -> Arc<T> {
let weak = Weak {
ptr: Box::into_raw_nonnull(box ArcInner {
strong: 0.into(),
weak: 1.into(),
data: UnsafeCell::new(mem::MaybeUninit::<T>::uninit()),
}).cast(),
};
let data = data_fn(&weak);
let inner: &'_ ArcInner<UnsafeCell<MaybeUninit<T>>> = weak.inner().unwrap();
unsafe {
inner
.data
.get() // exclusive access is guaranteed by `strong == 0`
.write(MaybeUninit::new(data))
;
// `data` has been initialized and is no longer (directly) mutated
// so we can finally allow other `Weak`s to be upgraded.
inner.strong.store(1, atomic::Ordering::Release);
// "transmute"-upgrade the `Weak` to an `Arc`
Arc::from_inner(ManuallyDrop::new(weak).ptr)
}
}
@danielhenrymantilla that sounds like a reasonable plan. What do you think about the variance issue that CI showed up? At the moment there is no syntax for "overriding" the variance of a type parameter, and any use of The only solution I can see is to change the pointer inside |
Ugh, I can't even do that, because there's no way to get |
I opened PR #72479 to address the UB as a first step. |
Just a few remarks regarding this PR : 1- There is no reason to restrict this addition to 2- Instead of adding a second-order function which first create a dead fn write_and_upgrade(ptr : Weak<T>, x : T) -> Option<Arc<T>> It would first check that the weak count is exactly 1, then allocate some memory and copy x there and finally turn the weak pointer into a strong one. Then, it is easy to re-implement |
That would break |
You are right. Then we would need to lift the restriction about the weak count. But then we need to use some kind of locking mechanism to make sure that the Another change we would need to make is reverting #50357 since we are actually upgrading a |
Yeah, I don't think any of that is worthwhile. Also, personally I would be pretty unhappy if |
This comment has been minimized.
This comment has been minimized.
I personally think that's a first-order API makes it possible to write more readable code, especially if there are several cycles to build at the same time. I guess that's a matter of taste... |
☔ The latest upstream changes (presumably #72639) made this pull request unmergeable. Please resolve the merge conflicts. |
I've updated this now that the UB fix has merged. |
@Diggsey what about @jhjourdan 's other point, that this should be added to |
TBH, I first wanted to establish that this is an API we want to add, before we get into the nitty gritty of documenting and testing it properly. I don't want to do all the work of implementing it only for everyone to agree that we don't want to extend the capabilities of Also, I agree there should be a counterpart for |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! But this is tricky enough to justify more eyes checking it.
Also, as discussed before, a similar API should be added to Rc
once we all agree that this is worthwhile.
r? @dtolnay |
Arc::new_with()
function for constructing self-referential data…Arc::new_with()
function for constructing self-referential data structures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could someone provide a realistic code snippet showing how this would be applied in real code and/or point out an open source library where this is applicable? The doc test shows some rationale but not necessarily how this would typically be applied in practice.
The only way to construct Being able to give out reference-counted pointers to yourself is a very useful operation, and similar methods exist in C++ for a class to obtain a https://en.cppreference.com/w/cpp/memory/enable_shared_from_this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, makes sense. I think I would be on board with this, possibly under a different name. Does Arc::new_cyclic
perhaps better capture the intended use case?
I have no particular preference as to the name: I went with |
@Diggsey waiting for you to update this PR and then we can get it reviewed again :) thanks |
☔ The latest upstream changes (presumably #73265) made this pull request unmergeable. Please resolve the merge conflicts. |
Should this be closed in favor of #75505? |
Yeah sorry never got around to updating this 😦 |
Add Arc::new_cyclic Rework of rust-lang#72443 References rust-lang#75861 cc @Diggsey @RalfJung r? @KodrAus
It should be noted that this exposes a new behaviour: a
Weak
can now become "live" after having previously observed it as being "dead".This does not impact the safety of the
Arc
because we ensure the weak reference count does not reach zero during this time.In the process this PR changes theArcInner::data
field to use anUnsafeCell
- my understanding of the unsafe code guidelines is that this change will become necessary even without the new API addition. However, the new function undeniably makes the data field internally mutable, whereas before there seemed to be some debate.To me, all of the prior uses of
ptr.as_mut()
seemed suspect as they could result in aliasing mutable references. Using anUnsafeCell
allows those usages to be replaced with a raw pointer which has no aliasing requirements.edit:
This UB was fixed in a separate PR.
Perhaps @RalfJung can provide some clarity on whether these additional changes are actually required.