Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree arena #752

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Tree arena #752

wants to merge 15 commits into from

Conversation

KGrewal1
Copy link

I've had a try at implementing the unsafe tree_arena in a separate lib (but in the same workspace for now) - haven't thought of a good way to make non root access not O(depth), without slowing down insertion or using more memory, though have improved accessing nodes from the root to O(1) from O(depth) and accessing direct children to O(1) from O(children)

@KGrewal1
Copy link
Author

KGrewal1 commented Nov 17, 2024

I am definitely doing something silly to break the needed preconditions, just need to work out what

Edit:
Think its just the reference to the Item being moved on insertion of new items, as its a reference to an element of the hashmap which can be invalidated

@PoignardAzur
Copy link
Contributor

PoignardAzur commented Nov 18, 2024

haven't thought of a good way to make non root access not O(depth), without slowing down insertion or using more memory,

The eventual plan will be to use a slotmap.

EDIT: Wait, no, I misunderstood the question.

@PoignardAzur
Copy link
Contributor

(Copying from zulip)

Your PR looks good, but we'd probably want more intermediary steps before relying on custom unsafe code; especially since the TreeArena API isn't quite settled yet.

Those steps would be:

  • Moving TreeArena to a separate crate without changing the code.
  • Adding a test suite.
  • Adding a command-line flag so we can switch between the safe implementation and the efficient implementation.

The test suite would help us test the unsafe version with MIRI, which we'd probably add to CI.

Copy link
Contributor

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very high-quality contribution, and will be good to merge once the review points are addressed.


Something orthogonal to this review is the question of how the TreeArena API will evolve.

Having an unsafe implementation may slow us down if we want to try API changes and we have to make sure they're implemented in both versions.

I'd like to establish that the safe version will be a first-class citizen and the unsafe version a second-class citizen:

  • The safe version may have features / APIs that the unsafe version doesn't yet have.
  • If both versions are at feature parity, Masonry can switch on the unsafe version for best performance.
  • Otherwise, Masonry uses the safe version.

If that's what we go with, we should documented that pattern in a few places.

Comment on lines 3 to 4
## Architecture

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, the point of an ARCHITECTURE.md file is to give just enough context to understand everything about a codebase, not just the public API.

As such, I think this file should start with a description of the safe-and-unsafe implementations and why we include both.

Comment on lines 29 to 33
Of finding children: $O(1)$ - previously $O(\text{children})$

Of finding deeper descendants: $O(\text{depth})$ - ideally will be made $O(1)$

Access from the root: $O(1)$, previously $O(\text{depth})$ - improved as all nodes are known to be descended from the root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid Latex for this section. ARCHITECTURE.md is mostly meant to be read in a code editor, you can use plain text.

Also, I think the descriptions could be a little clearer.

Copy link
Member

@DJMcNab DJMcNab Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see an argument for using the GitHub Markdown MathJax extension. But for big-O notation, $$O(1)$$ or $$\mathcal{O}(1)$$ is not that much better than O(1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thinking, given readme is so bare, does it make more sense to flesh this out more, but also move it into Readme rather than having a superfluous file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The general workflow I like is "README for things you need to understand the project, ARCHITECTURE for things you need to understand the project's internals that you wouldn't get from the README or the doc root".

Comment on lines 13 to 14
It is possible to get shared (immutable) access or exclusive (mutable) access to the tree. These return `ArenaRef<'arena, T>` or `ArenaMut<'arena, T>` respectively

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this description could go into a little more detail about how the shared/exclusive access works, and why this crate is worth bothering with.

The general idea of TreeArena is to express a tree-like ownership graph, stored inside a flat data structure. The UnsafeCells are meant to bridge the gap between the two.

tree_arena/src/lib.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_safe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/tests/basic_tests.rs Outdated Show resolved Hide resolved
masonry/src/contexts.rs Outdated Show resolved Hide resolved
@KGrewal1
Copy link
Author

Regarding testing with miri in CI, the invocation I think is:

cargo +nightly miri t -p tree_arena --no-fail-fast --no-default-features  

but I'm unsure how best to access nightly rust in the workflow to run this

Comment on lines 172 to 177
/// # SAFETY
///
/// When using this on [`ArenaMutChildren`] associated with some node,
/// must ensure that `id` is a descendant of that node, otherwise can
/// obtain two mutable references to the same node
unsafe fn find_mut(&mut self, id: impl Into<NodeId>) -> Option<ArenaMut<'_, T>> {
Copy link

@Imberflur Imberflur Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I have a minor note on the safety documentation. This is sort of a comment out of nowhere, but I've developed an interest in reviewing unsafe code and I've been following the development in this repo.

Since this retains the full &mut self inside the returned ArenaMutChildren, I think the safety requirements are more extensive than the case described here. Any use of parent_arena in ArenaMutChildren needs to be careful to not invalidate the reference produced from unsafe { item.get().as_mut() } below. E.g. DataMap::find also must only be called on descendants, parent_arena.items.remove() must only operate on descendants, parent_arena.items.insert() must not overwrite existing nodes, parent_arena.items must be used to clear the whole structure, etc.

I was initially thinking the safety documentation could be reworked to note the overall requirement that the parent_arena field must not be used to invalidate the returned item reference and then include a few examples. However, I think all call sites would have the same safety note pointing to the details of ArenaMutChildren. So since all the details of ArenaMutChildren are private to this module, find_mut could be safe, and the unsafe block in it could be documented with the fact that all operations that ArenaMutChildren allows only access descendants and never invalidate the item reference created here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the overall point that the safety requirement should be better put as that you can only access children or remove children of the current node (I'm not sure whether insert() comment would be needed as it is a panic to insert a node with the same name as the key, and not sure clearing is different to removal as there isn't a clear api, and I assume any would be an action on the Arena itself thus checked by the borrow checker as having a reference to an item in the tree prevents any mutable access of the tree itself)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding safety, I think this makes sense, but can also see another argument that it being unsafe is correct in the case it did become public (though assume then the same should be true of the immutable method otherwise could obtain a immutable and mutable reference to the same node)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether insert() comment would be needed as it is a panic to insert a node with the same name as the key,

there isn't a clear api, and I assume any would be an action on the Arena itself thus checked by the borrow checker as having a reference to an item in the tree prevents any mutable access of the tree itself

These are comments about what the safe code in ArenaMutChildren is allowed to do with the internal items hashmap (which has things such as insert without such checks and clear).

Copy link
Contributor

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending changing the find methods and removing the children lists.

You can consider everything else optional before merging.

tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Show resolved Hide resolved
tree_arena/tests/basic_tests.rs Show resolved Hide resolved
@PoignardAzur
Copy link
Contributor

Another blocker before this is merged: I'd like for some version of my comment to show up in the documentation:

Having an unsafe implementation may slow us down if we want to try API changes and we have to make sure they're implemented in both versions.

I'd like to establish that the safe version will be a first-class citizen and the unsafe version a second-class citizen:

* The safe version may have features / APIs that the unsafe version doesn't yet have.

* If both versions are at feature parity, Masonry can switch on the unsafe version for best performance.

* Otherwise, Masonry uses the safe version.

If that's what we go with, we should documented that pattern in a few places.

@KGrewal1
Copy link
Author

Added to the readme and the doc comment at the root of the crate

Copy link
Contributor

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'd maybe like a comment noting that TreeNode::children needs to be removed but otherwise we're good.

@KGrewal1
Copy link
Author

KGrewal1 commented Nov 29, 2024

I'm not sure TreeNode::children can be removed yet as it is being used on node removal (as we remove all descended nodes) - I think the alternative would be iterating over the whole hashmap to find which children have the current node as their parent but that would definitely be inefficient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants