-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make AllocId decoding thread-safe #50957
Conversation
Repeating my worries from the other PR:
Additionally I don't think we should do this at all, even considering it works now, because it will cause all those bugs again if we allow let mut foo = Rc::new(RefCell::new(None));
let bar = Rc::new(RefCell::new(Some(foo.clone())));
*foo.borrow_mut() = Some(bar); within constants. #49172 is a first step in that direction. Any cyclic pointter structure inside constants will not work with the system proposed in this PR. |
I want to take a closer look at this. |
So, I think the difference between this and what we had before the table approach is that I think it would also work for circular allocation graphs if we cache the |
src/librustc/mir/interpret/mod.rs
Outdated
trace!("encoding {:?} with {:#?}", alloc_id, alloc); | ||
AllocKind::Alloc.encode(encoder)?; | ||
alloc.encode(encoder)?; | ||
cache(encoder).insert_same(alloc_id, pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insert_same()
doesn't seem what we want here. If this could be reached in a racy way then pos
would not necessarily be the same. It would also mean that two encoders would write to the same stream. Something like assert!(cache(encoder).insert(alloc_id, pos).is_none())
seems more appropriate. Correct me if I'm wrong.
It looks good to me if we support encoding circular graphs, as noted above, that is... |
This is exactly the same situation we had before, except that The case I mentioned will still happen. Imagine the following steps:
Since thinking about this tends to fry my brain, I created a google doc illustrating the issue: https://docs.google.com/presentation/d/1AWwnDxuZKZgj1PvWo5mPiwhapmV5h3bjUVn-De-tpKc/edit?usp=sharing
While you can pre-cache the We also cannot encode this skip bytes amount, because at encoding time we don't know how far they are. We could reserve 4 bytes and write back the skip amount later, but that'll get horrible fast. That said. I think we should just do this, because as @Zoxc correctly pointed out to me some time ago, we will (in the future) refactor enum AllocId<'tcx> {
Static(DefId),
Function(Instance<'tcx>),
Local(u64),
} where |
Oh that said, yes please insert loads of sanity checks as @michaelwoerister already pointed out. I'd rather have sensible assertions triggerd than really weird decoding errors later in the pipeline. This mainly means asserting that the return value of any |
@oli-obk, I'm wondering if case 3 in your presentation wouldn't just work (although it would decode Alloc(99) twice):
Yes, please! |
We are also creating real AllocIds, so we'd need to ensure we don't create new ones for the same allocation. And then we need to guarantee that the second interning produces the exact same Allocation. This might get tricky, especially with multithreading being involved. I'll do another review wrt multithreading |
let alloc_type: AllocType<'tcx, &'tcx Allocation> = | ||
tcx.alloc_map.lock().get(alloc_id).expect("no value for AllocId"); | ||
match alloc_type { | ||
AllocType::Memory(alloc) => { | ||
if let Some(alloc_pos) = cache(encoder).get(&alloc_id).cloned() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to be an atomic "get or insert" operation in order to prevent two threads that get here at the same time from both trying to encode alloc
(I think this is the same as what @michaelwoerister mentioned below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation is effectively atomic since we have unique ownership of the encoder and the cache. This doesn't matter though as encoding isn't intended to be multithreaded.
AllocKind::AllocAtPos.encode(encoder)?; | ||
return encoder.emit_usize(alloc_pos); | ||
} | ||
let pos = encoder.position(); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Yes. I've moved the insertion so that it should handle circular allocation graphs there. |
There was a possibly race condition where one thread would decode an I've changed the way decoding works to deal with this. We now have 2 caches. One global and one for the current session. The global cache contains a flag which indicates if the This PR does not attempt to make encoding thread-safe, as we currently only encode using a single thread. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Makes sense.
Yes, at the moment we don't have an encoder that could work concurrently anyway. The travis error suggests that it's trying to decode from an invalid position somewhere. |
☔ The latest upstream changes (presumably #50866) made this pull request unmergeable. Please resolve the merge conflicts. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Here's a backtrace for the ICE.
|
OK, I think this is the issue @oli-obk has been warning about all along. The code does the following:
|
Oh right. this has nothing to do with recursive allocs, just with a different order of encoding and decoding (of completely unrelated objects that contain the same AllocId) |
Though without recursion you can cache the end position, too, and everything should work (It did for the old impl, just died on recursion then, because you didn't know the end position yet when you got to the actual allocation) |
Oh that won't work for parallel decoding :( Any ideas @Zoxc ? |
I think we should stick to the table based approach. I'm sure it can be made to work with parallel decoding. |
// Create an id which is not fully loaded | ||
(tcx.alloc_map.lock().reserve(), false) | ||
}); | ||
if fully_loaded || !local_cache(decoder).insert(alloc_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if one thread tstarts decoding, the next thread takes over the CPU, gets here for the same AllocId, skips over and tries to access the allocation? It'll ICE about uncached alloc or error with dangling pointer, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the fully_loaded
case this isn't a problem since the AllocId
has an Allocation
assigned. For the !local_cache(decoder).insert(alloc_id)
case, we know that some stack frame above us will assign an AllocId
before the result will be used. Since local_cache
is thread local another thread won't see the value inserted here. It may instead decode the same allocation in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, neat. Please make this explanation a comment on that if statement
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
src/librustc/mir/interpret/mod.rs
Outdated
|
||
// Write placeholder for size | ||
let size_pos = encoder.position(); | ||
0usize.encode(encoder)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because of variable-length integer encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rustc has some similar code elsewhere and works around this by using a 4 byte array that the size is encoded into. Somewhat space-wasteful though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler to just remember the in the global_cache
during decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't work, because we need this value also when another thread hasn't finished decoding the allocation yet.
match AllocKind::decode(decoder)? { | ||
AllocKind::AllocAtPos => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the trick here that I had originally where you read a usize, and that tag is either 0 for Alloc
, 1 for Static
, 2 for Fn
or anything else is the real_pos
. This is also used in Ty
encoding I think.
Before @Zoxc does more work here, we should decide whether
is an option. It is certainly the more easily grokked option. What exactly is blocking that solution from allowing parallel decoding? Isn't the table based solution essentially equivalent to this solution but it's |
@oli-obk, do you remember why exactly we switched to the table-based approach? Because you already had the skipping implemented but that may not have been sufficient for all cases. |
The table-based approach also might work better if we ever want to make the cache updateable in-place. I know that was one of the reasons why we wanted it. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@TimNN @rust-highfive log is empty |
The skipping failed for recursive cases, that's where we decided to scrap the approach for thetable based version, since iirc that's what we wanted all along.
Oh yea, that would definitely require an indirection like the one used currently |
|
FYI, I'm looking into an alternative implementation of this right now. |
WIP: Make const decoding thread-safe. This is an alternative to #50957. It's a proof of concept (e.g. it doesn't adapt metadata decoding, just the incr. comp. cache) but I think it turned out nice. It's rather simple and does not require passing around a bunch of weird closures, like we currently do. If you (@Zoxc & @oli-obk) think this approach is good then I'm happy to finish and clean this up. Note: The current version just spins when it encounters an in-progress decoding. I don't have a strong preference for this approach. Decoding concurrently is equally fine by me (or maybe even better because it doesn't require poisoning). r? @Zoxc
Make const decoding thread-safe. This is an alternative to #50957. It's a proof of concept (e.g. it doesn't adapt metadata decoding, just the incr. comp. cache) but I think it turned out nice. It's rather simple and does not require passing around a bunch of weird closures, like we currently do. If you (@Zoxc & @oli-obk) think this approach is good then I'm happy to finish and clean this up. Note: The current version just spins when it encounters an in-progress decoding. I don't have a strong preference for this approach. Decoding concurrently is equally fine by me (or maybe even better because it doesn't require poisoning). r? @Zoxc
☔ The latest upstream changes (presumably #51060) made this pull request unmergeable. Please resolve the merge conflicts. |
the alternative to this PR (#51060) has been merged |
This builds on top of #50520.
cc @michaelwoerister
r? @oli-obk