Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Fix unsoundness in EntityMut::world_scope #7387

Closed
wants to merge 17 commits into from

Conversation

JoJoJet
Copy link
Member

@JoJoJet JoJoJet commented Jan 27, 2023

Objective

Found while working on #7385.

The struct EntityMut has the safety invariant that it's cached EntityLocation must always accurately specify where the entity is stored. Thus, any time its location might be invalidated (such as by calling EntityMut::world_mut and moving archetypes), the cached location must be updated by calling EntityMut::update_location.

The method world_scope encapsulates this pattern in safe API by requiring world mutations to be done in a closure, after which update_location will automatically be called. However, this method has a soundness hole: if a panic occurs within the closure, then update_location will never get called. If the panic is caught in an outer scope, then the EntityMut will be left with an outdated location, which is undefined behavior.

An example of this can be seen in the unit test entity_mut_world_scope_panic, which has been added to this PR as a regression test. Without the other changes in this PR, that test will invoke undefined behavior in safe code.

Solution

Call EntityMut::update_location() from within a Drop impl, which ensures that it will get executed even if EntityMut::world_scope unwinds.

@JoJoJet JoJoJet added A-ECS Entities, components, systems, and events P-Unsound A bug that results in undefined compiler behavior labels Jan 27, 2023
@alice-i-cecile alice-i-cecile added this to the 0.10 milestone Jan 27, 2023
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice catch. We might want to audit the other areas of bevy_ecs that use callbacks like this.

crates/bevy_ecs/src/world/entity_ref.rs Show resolved Hide resolved

let mut entity = world.spawn_empty();
let id = entity.id();
let res = std::panic::catch_unwind(AssertUnwindSafe(|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the AssertUnwindSafe by making the closure move?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would move the variable entity into the closure and make it inaccessible to the rest of the test.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, that's subtle. Good explanation and test though.

I don't have full expertise to decide if this is the best possible solution though.

@alice-i-cecile alice-i-cecile added the D-Complex Quite challenging from either a design or technical perspective. Ask for help! label Jan 27, 2023
@BoxyUwU
Copy link
Member

BoxyUwU commented Jan 27, 2023

imo it should be:

    /// Gives mutable access to this `EntityMut`'s [`World`] in a temporary scope.
    pub fn world_scope<T>(&mut self, f: impl FnOnce(&mut World) -> T) -> T {
        let result = std::panic::catch_unwind(AssertUnwindSafe(|| f(self.world)));
        self.update_location();
        result.unwrap_or_else(|e| std::panic::panic_any(e))
    }

this PR's current unsafe code is tbh incredibly sketchy, im so surprised that it doesnt throw a miri error, and if you tweak it a little it -does-. plus many smart people who know how stacked borrows work were also confused why it didnt error (hard to state how funny this is tbh).

@JoJoJet
Copy link
Member Author

JoJoJet commented Jan 27, 2023

That's definitely a much cleaner solution. I didn't realize something that straightforward was possible.

I take it BlobVec::replace_unchecked could use a similar approach?

@james7132
Copy link
Member

I'm curious what impact this has on performance, particularly since we use world_scope everywhere in the hierarchy commands.

@JoJoJet
Copy link
Member Author

JoJoJet commented Jan 28, 2023

What would be the most relevant benchmarks for measuring that?

@JoJoJet
Copy link
Member Author

JoJoJet commented Jan 28, 2023

I ended up switching to the approach suggested by @hymm, which I like more after thinking about it. This approach should have better codegen since it doesn't pull in the panic machinery (correct me if I'm wrong).

@james7132
Copy link
Member

I ended up switching to the approach suggested by @hymm, which I like more after thinking about it. This approach should have better codegen since it doesn't pull in the panic machinery (correct me if I'm wrong).

This solution definitely is cleaner without the unsafe and shouldn't force writing out a panic handler on every call. Should work just fine.

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 28, 2023
crates/bevy_ecs/src/world/entity_ref.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/world/entity_ref.rs Outdated Show resolved Hide resolved
impl Drop for Guard<'_, '_> {
#[inline]
fn drop(&mut self) {
self.entity_mut.update_location();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document that world_scope() calls update_location()? This was missing in the original code too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a conscious choice not to mention unsafe implementation details in the docs of a safe method

Copy link
Contributor

@djeedai djeedai Jan 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. But on the other hand if I see the update_location() on docs.rs then I will immediately ask myself "should I use that after world_scope()?", and safety doesn't answer my question; if anything I'd be tempted to think I do need it since it's unsafe but world_scope() is not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for EntityMut::update_location say that it only needs to be called if you are using EntityMut::world_mut. Perhaps it could be phrased differently to make that more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that should be fine.

Copy link
Contributor

@djeedai djeedai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes thanks for fixing the doc, that was the wrong reference indeed.

@james7132
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request Jan 29, 2023
# Objective

Found while working on #7385.

The struct `EntityMut` has the safety invariant that it's cached `EntityLocation` must always accurately specify where the entity is stored. Thus, any time its location might be invalidated (such as by calling `EntityMut::world_mut` and moving archetypes), the cached location *must* be updated by calling `EntityMut::update_location`.

The method `world_scope` encapsulates this pattern in safe API by requiring world mutations to be done in a closure, after which `update_location` will automatically be called. However, this method has a soundness hole: if a panic occurs within the closure, then `update_location` will never get called. If the panic is caught in an outer scope, then the `EntityMut` will be left with an outdated location, which is undefined behavior.

An example of this can be seen in the unit test `entity_mut_world_scope_panic`, which has been added to this PR as a regression test. Without the other changes in this PR, that test will invoke undefined behavior in safe code.

## Solution

Call `EntityMut::update_location()` from within a `Drop` impl, which ensures that it will get executed even if `EntityMut::world_scope` unwinds.
@bors bors bot changed the title Fix unsoundness in EntityMut::world_scope [Merged by Bors] - Fix unsoundness in EntityMut::world_scope Jan 29, 2023
@bors bors bot closed this Jan 29, 2023
@JoJoJet JoJoJet deleted the world-scope-unsound branch January 29, 2023 02:30
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

Found while working on bevyengine#7385.

The struct `EntityMut` has the safety invariant that it's cached `EntityLocation` must always accurately specify where the entity is stored. Thus, any time its location might be invalidated (such as by calling `EntityMut::world_mut` and moving archetypes), the cached location *must* be updated by calling `EntityMut::update_location`.

The method `world_scope` encapsulates this pattern in safe API by requiring world mutations to be done in a closure, after which `update_location` will automatically be called. However, this method has a soundness hole: if a panic occurs within the closure, then `update_location` will never get called. If the panic is caught in an outer scope, then the `EntityMut` will be left with an outdated location, which is undefined behavior.

An example of this can be seen in the unit test `entity_mut_world_scope_panic`, which has been added to this PR as a regression test. Without the other changes in this PR, that test will invoke undefined behavior in safe code.

## Solution

Call `EntityMut::update_location()` from within a `Drop` impl, which ensures that it will get executed even if `EntityMut::world_scope` unwinds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events D-Complex Quite challenging from either a design or technical perspective. Ask for help! P-Unsound A bug that results in undefined compiler behavior S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants