-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change BTreeMap
to use parent pointers
#27865
Comments
All nodes are boxed, so this is trivially true? |
|
Well, the root node definitely isn't boxed. I guess the non-root ones do have a stable address, though, because their parent nodes' allocations never change (as far as I can tell). |
Oh whoops, yes you're right. I forgot that we did that. I fear that without ManuallyDrop we're going to have to go Full HashMap and manage all of our allocations/destructing. |
Oh wait no we're 90% of the way to Full HashMap anyway. I'm remembering the good ol' days... |
Auuuugh all of this is coming back to me. Basically it seems to me that you "want"
But this design forbids you from not allocating space for your edges in your leaves. This is why we currently have (effectively):
Which enables the edges to not be allocated (doubling as is_leaf). We then also abuse this fact to keep the root inline, and avoid an extra allocation/indirection. We could also consider a hybrid:
But this has the negative side-effect of making the edges array enormous. I think google's btree does the following:
And then based on the value of is_leaf, may cast the BaseNode to an InternalNode. This works because a Leaf never becomes Internal or vice-versa. So leaves only allocate the space they need, and internal nodes allocate more but get type-erased to "only" leaves. And then blah blah figure out deallocation yourself whatever. |
The type-erasure approach seems best, but as I understand it, it's technically undefined behavior for us to transmute between the two struct types, right? |
Not if they're repr(C) and one is a prefix of the other. You would be transmuting the pointers, of course. |
OK, I got a prototype working with type-erasure and parent-pointer insertion, though it's not using the existing BTree code. I need to play around with it more in isolation before attempting a PR. |
How important is it to keep the ability to specify |
I'm leaning toward the following representation: const T: usize = 6;
pub struct BTreeMap<K, V> {
root: Option<Box<Node<K, V>>>,
...
}
#[repr(C)]
struct Node<K, V> {
keys: [K; 2 * T - 1],
vals: [V: 2 * T - 1],
parent: *mut InternalNode<K, V>,
len: u16,
idx: u16,
is_leaf: bool,
}
#[repr(C)]
struct InternalNode<K, V> {
node: Node<K, V>,
edges: [*mut Node<K, V>; 2 * T],
} We don't need |
Being able to specify B is totally unimportant and is slated to be deprecated. Also this layout wastes an extra ptr of space over using a u8 for len and idx on 32-bit (no effect on 64bit). As a minor bikeshed I'd maybe call idx "parent_idx" or whatever. |
Oh but otherwise 👍 this is basically what I had in mind. |
I was aware of the waste, but unsure whether we wanted to support What do we have to be aware of with zero-sized types here? |
Zero-sized types should Just Work in this case. Particularly the parent pointer ensures a nice minimum alignment. |
Here are some preliminary benchmarks for just insertion:
|
Fantastic!!! |
The problem is patching the new stuff into the existing code; it's likely easier to just rewrite the entire module than change things piece by piece. |
I have no problem with that, as long as you're down to do it. |
Woa, impressive gains! |
I will be implementing this as a standalone library before attempting a patch: https://github.com/apasel422/btree (currently empty) |
An alternative to the const T: usize = 6;
pub struct BTreeMap<K, V> {
root: Option<Box<Node<K, V>>>,
...
}
struct Node<K, V> {
keys: [K; 2 * T - 1],
vals: [V: 2 * T - 1],
parent: *mut InternalNode<K, V>,
len: u16,
idx: u16,
}
struct InternalNode<K, V> {
node: Node<K, V>,
edges: EdgeData<K, V>,
}
enum EdgeData<K, V> {
Leaves([Box<Node<K, V>>; 2 * T]),
Internals([Box<InternalNode<K, V>>; 2 * T]),
} |
@gereeter But can't each of the sub-nodes be either a leaf or an internal node? |
@gereeter In your example, given an |
@eddyb does that actually accomplish anything? We have a bunch of extra bytes in the node struct anyway. |
@eddyb All leaves are at the same depth in a b-tree |
@apasel422 You can't figure it out. However, you could have |
@gereeter What I mean is, how would you use that to perform an insertion? The root in your example is a |
@apasel422 Oh, whoops. You would need to do the same sort of branching at the root instead of just storing a |
Note: I think that the ideal representation, in pseudo-Rust, is the following. struct BTreeMap<K, V> {
height: usize,
root: Option<Box<Node<K, V, height>>>
}
struct Node<K, V, height: usize> {
keys: [K; 2 * T - 1],
vals: [V; 2 * T - 1],
edges: if height > 0 {
[Box<Node<K, V, height - 1>>; 2 * T]
} else { () }
parent: *mut Node<K, V, height + 1>,
len: u16,
idx: u16,
} Essentially, we don't store |
@gereeter Is it possible that DSTs could help there, since nodes will have to be behind a pointer anyway? |
@apasel422 You could use |
Preliminary forward by-ref iteration benches:
|
Huge gains! |
Preliminary results from my code that doesn't use
To be honest, I'm not exactly sure what to make of this:
|
My WIP implementation is at https://github.com/gereeter/btree-rewrite. |
@gereeter Perhaps we can combine forces to tackle this. |
@apasel422 Yes, that would be a good idea. However, I'm not sure which approach should be used, and unfortunately I don't see any good way of combining our codebases because of all the mechanisms I use to control unsafety that simply aren't necessary in your codebase. |
Sorry to nag, but it looks like progress halted in all fronts. Is anyone still working on this? |
Sorry for the stagnation. I was (and still am) fairly near completion, just needing to finish removal, but unfortunately schoolwork has been keeping me very busy and I haven't been thinking about this. |
@gereeter nice to hear that, did you and @apasel422 figure out the best model for the leaves? |
My benchmarks seemed to rule in favor of not using |
Despite being over 700 lines shorter, this implementation should use less memory than the previous one and is faster on at least insertions and iteration, the latter improving approximately 5x. Technically a [breaking-change] due to removal of deprecated functions. cc @apasel422 @gankro @goyox86 Fixes #27865. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/rust-lang/rust/30426) <!-- Reviewable:end -->
This will require nodes to have a stable address.
CC #26227 @gankro.
The text was updated successfully, but these errors were encountered: