Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change BTreeMap to use parent pointers #27865

Closed
apasel422 opened this issue Aug 17, 2015 · 42 comments
Closed

Change BTreeMap to use parent pointers #27865

apasel422 opened this issue Aug 17, 2015 · 42 comments

Comments

@apasel422
Copy link
Contributor

This will require nodes to have a stable address.

CC #26227 @gankro.

@Gankra
Copy link
Contributor

Gankra commented Aug 17, 2015

All nodes are boxed, so this is trivially true?

@apasel422
Copy link
Contributor Author

BTreeMap directly contains a Node, and actually no nodes are boxed. They have fields for the allocated keys/vals/edges instead of being boxed themselves.

@apasel422
Copy link
Contributor Author

Well, the root node definitely isn't boxed. I guess the non-root ones do have a stable address, though, because their parent nodes' allocations never change (as far as I can tell).

@Gankra
Copy link
Contributor

Gankra commented Aug 17, 2015

Oh whoops, yes you're right. I forgot that we did that.

I fear that without ManuallyDrop we're going to have to go Full HashMap and manage all of our allocations/destructing.

@Gankra
Copy link
Contributor

Gankra commented Aug 17, 2015

Oh wait no we're 90% of the way to Full HashMap anyway. I'm remembering the good ol' days...

@Gankra
Copy link
Contributor

Gankra commented Aug 17, 2015

Auuuugh all of this is coming back to me.

Basically it seems to me that you "want"

Node {
  len: u8,
  parent_idx: u8,
  is_leaf: bool,
  parent: *mut Node,
  keys: ManuallyDrop<[K; n]>
  vals: ManuallyDrop<[V; n]>
  edges: ManuallyDrop<[Box<Node>; n + 1]> 
}

Map {
  root: Option<Box<Node>>
}

But this design forbids you from not allocating space for your edges in your leaves. This is why we currently have (effectively):

Node {
    len: u8,
    // Actually just Uniques that we manually allocate and manage
    keys: Box<ManuallyDrop<[K; n]>>,
    vals: Box<ManuallyDrop<[V; n]>>,
    edges: Option<Box<ManuallyDrop<[Node; n + 1]>>>,
}

Which enables the edges to not be allocated (doubling as is_leaf). We then also abuse this fact to keep the root inline, and avoid an extra allocation/indirection.

We could also consider a hybrid:

Node {
    len: u8,
    keys: [K; n],
    vals: [V; n],
    edges: Option<Box<ManuallyDrop<[Node; n + 1]>>>,
}

But this has the negative side-effect of making the edges array enormous.

I think google's btree does the following:

struct BaseNode {
   is_leaf: bool,
   keys: ManuallyDrop<[K; n]>
   vals: ManuallyDrop<[V; n]>
   // and other common stuff like parent pointers
}

// Inherit from BaseNode
struct InternalNode: BaseNode {
   edges: ManuallyDrop<[Unique<BaseNode>; n+1]>
}

And then based on the value of is_leaf, may cast the BaseNode to an InternalNode.

This works because a Leaf never becomes Internal or vice-versa. So leaves only allocate the space they need, and internal nodes allocate more but get type-erased to "only" leaves. And then blah blah figure out deallocation yourself whatever.

@apasel422
Copy link
Contributor Author

The type-erasure approach seems best, but as I understand it, it's technically undefined behavior for us to transmute between the two struct types, right?

@Gankra
Copy link
Contributor

Gankra commented Aug 17, 2015

Not if they're repr(C) and one is a prefix of the other. You would be transmuting the pointers, of course.

@apasel422
Copy link
Contributor Author

OK, I got a prototype working with type-erasure and parent-pointer insertion, though it's not using the existing BTree code. I need to play around with it more in isolation before attempting a PR.

@apasel422
Copy link
Contributor Author

How important is it to keep the ability to specify b? It seems like we're waiting for type-level integers (or associated constants) to land before stabilizing that functionality anyway.

@apasel422
Copy link
Contributor Author

I'm leaning toward the following representation:

const T: usize = 6;

pub struct BTreeMap<K, V> {
    root: Option<Box<Node<K, V>>>,
    ...
}

#[repr(C)]
struct Node<K, V> {
    keys: [K; 2 * T - 1],
    vals: [V: 2 * T - 1],
    parent: *mut InternalNode<K, V>,
    len: u16,
    idx: u16,
    is_leaf: bool,
}

#[repr(C)]
struct InternalNode<K, V> {
    node: Node<K, V>,
    edges: [*mut Node<K, V>; 2 * T],
}

We don't need ManuallyDrop right now, because we can just take the root and mem::forget the node after running the key and value destructors and recursively dropping any edges. We should be able to easily replace the T const with an integer parameter when those are ready.

@Gankra
Copy link
Contributor

Gankra commented Aug 18, 2015

Being able to specify B is totally unimportant and is slated to be deprecated.

Also this layout wastes an extra ptr of space over using a u8 for len and idx on 32-bit (no effect on 64bit).

As a minor bikeshed I'd maybe call idx "parent_idx" or whatever.

@Gankra
Copy link
Contributor

Gankra commented Aug 18, 2015

Oh but otherwise 👍 this is basically what I had in mind.

@apasel422
Copy link
Contributor Author

I was aware of the waste, but unsure whether we wanted to support B > 255. u8 should suffice, though. (In a future Rust, we might even be able to determine how big to make {len, parent_idx} based on B.)

What do we have to be aware of with zero-sized types here?

@Gankra
Copy link
Contributor

Gankra commented Aug 18, 2015

Zero-sized types should Just Work in this case. Particularly the parent pointer ensures a nice minimum alignment.

@apasel422
Copy link
Contributor Author

Here are some preliminary benchmarks for just insertion:

test rand_100000_parent ... bench:         172 ns/iter (+/- 9)
test rand_100000_std    ... bench:         208 ns/iter (+/- 14)
test rand_10000_parent  ... bench:         109 ns/iter (+/- 8)
test rand_10000_std     ... bench:         146 ns/iter (+/- 8)
test rand_100_parent    ... bench:          63 ns/iter (+/- 10)
test rand_100_std       ... bench:          84 ns/iter (+/- 17)
test seq_100000_parent  ... bench:          73 ns/iter (+/- 5)
test seq_100000_std     ... bench:         115 ns/iter (+/- 7)
test seq_10000_parent   ... bench:          58 ns/iter (+/- 3)
test seq_10000_std      ... bench:          84 ns/iter (+/- 14)
test seq_100_parent     ... bench:          21 ns/iter (+/- 2)
test seq_100_std        ... bench:          44 ns/iter (+/- 5)

@Gankra
Copy link
Contributor

Gankra commented Aug 19, 2015

Fantastic!!!

@apasel422
Copy link
Contributor Author

The problem is patching the new stuff into the existing code; it's likely easier to just rewrite the entire module than change things piece by piece.

@Gankra
Copy link
Contributor

Gankra commented Aug 19, 2015

I have no problem with that, as long as you're down to do it.

@arthurprs
Copy link
Contributor

Woa, impressive gains!

@apasel422
Copy link
Contributor Author

I will be implementing this as a standalone library before attempting a patch: https://github.com/apasel422/btree (currently empty)

@gereeter
Copy link
Contributor

An alternative to the #[repr(C)] trick would be to store the is_leaf information a level up, with the edges, as follows:

const T: usize = 6;

pub struct BTreeMap<K, V> {
    root: Option<Box<Node<K, V>>>,
    ...
}

struct Node<K, V> {
    keys: [K; 2 * T - 1],
    vals: [V: 2 * T - 1],
    parent: *mut InternalNode<K, V>,
    len: u16,
    idx: u16,
}

struct InternalNode<K, V> {
    node: Node<K, V>,
    edges: EdgeData<K, V>,
}

enum EdgeData<K, V> {
    Leaves([Box<Node<K, V>>; 2 * T]),
    Internals([Box<InternalNode<K, V>>; 2 * T]),
}

@eddyb
Copy link
Member

eddyb commented Aug 21, 2015

@gereeter But can't each of the sub-nodes be either a leaf or an internal node?
One nice trick here would be to use the lowest bit to indicate whether a node is internal or a leaf, though you can't do it in entirely safe code just yet (it's not even an optimization you can do without opting out of being able to take references to a Box<Node> field).

@apasel422
Copy link
Contributor Author

@gereeter In your example, given an &Node<K, V>, how do you know if it's internal or a leaf?

@Gankra
Copy link
Contributor

Gankra commented Aug 21, 2015

@eddyb does that actually accomplish anything? We have a bunch of extra bytes in the node struct anyway.

@bluss
Copy link
Member

bluss commented Aug 21, 2015

@eddyb All leaves are at the same depth in a b-tree

@gereeter
Copy link
Contributor

@apasel422 You can't figure it out. However, you could have NodeRef types that keep track of that information.

@apasel422
Copy link
Contributor Author

@gereeter What I mean is, how would you use that to perform an insertion? The root in your example is a Node.

@gereeter
Copy link
Contributor

@apasel422 Oh, whoops. You would need to do the same sort of branching at the root instead of just storing a Node.

@gereeter
Copy link
Contributor

Note: I think that the ideal representation, in pseudo-Rust, is the following.

struct BTreeMap<K, V> {
    height: usize,
    root: Option<Box<Node<K, V, height>>>
}

struct Node<K, V, height: usize> {
    keys: [K; 2 * T - 1],
    vals: [V; 2 * T - 1],
    edges: if height > 0 {
        [Box<Node<K, V, height - 1>>; 2 * T]
    } else { () }
    parent: *mut Node<K, V, height + 1>,
    len: u16,
    idx: u16,
}

Essentially, we don't store is_leaf information anywhere except for the root and simply infer this information everwhere from the root. I'm trying to write an implementation that uses this representation, but it is difficult to keep the unsafety under control.

@apasel422
Copy link
Contributor Author

@gereeter Is it possible that DSTs could help there, since nodes will have to be behind a pointer anyway?

@gereeter
Copy link
Contributor

@apasel422 You could use edges: [Box<Node<K, V>>] and check the length of that array to see if you are in a leaf or not. However, I'm not sure that this is the greatest idea, as it uses a whole usize-worth of space for every is_leaf flag. Additionally, it would make allocation a pain. With the current state of DSTs, I don't see a way to use them to actually reduce the amount of extra space used.

@apasel422
Copy link
Contributor Author

Preliminary forward by-ref iteration benches:

test iter_100000_parent ... bench:     381,227 ns/iter (+/- 24,162)
test iter_100000_std    ... bench:   1,246,445 ns/iter (+/- 24,031)
test iter_1000_parent   ... bench:       2,950 ns/iter (+/- 612)
test iter_1000_std      ... bench:      12,452 ns/iter (+/- 1,868)
test iter_20_parent     ... bench:          54 ns/iter (+/- 14)
test iter_20_std        ... bench:         284 ns/iter (+/- 23)

@arthurprs
Copy link
Contributor

Huge gains!

@gereeter
Copy link
Contributor

Preliminary results from my code that doesn't use is_leaf flags:

test rand_1000000_apasel ... bench:         425 ns/iter (+/- 31)
test rand_1000000_parent ... bench:         343 ns/iter (+/- 29)
test rand_1000000_std    ... bench:         472 ns/iter (+/- 31)
test rand_100000_apasel  ... bench:         260 ns/iter (+/- 11)
test rand_100000_parent  ... bench:         282 ns/iter (+/- 21)
test rand_100000_std     ... bench:         298 ns/iter (+/- 22)
test rand_10000_apasel   ... bench:         196 ns/iter (+/- 9)
test rand_10000_parent   ... bench:         242 ns/iter (+/- 9)
test rand_10000_std      ... bench:         210 ns/iter (+/- 9)
test rand_100_apasel     ... bench:         108 ns/iter (+/- 7)
test rand_100_parent     ... bench:          99 ns/iter (+/- 6)
test rand_100_std        ... bench:         121 ns/iter (+/- 11)
test seq_1000000_apasel  ... bench:         181 ns/iter (+/- 4)
test seq_1000000_parent  ... bench:         168 ns/iter (+/- 8)
test seq_1000000_std     ... bench:         205 ns/iter (+/- 26)
test seq_100000_apasel   ... bench:         174 ns/iter (+/- 0)
test seq_100000_parent   ... bench:         174 ns/iter (+/- 5)
test seq_100000_std      ... bench:         177 ns/iter (+/- 8)
test seq_10000_apasel    ... bench:         121 ns/iter (+/- 0)
test seq_10000_parent    ... bench:          91 ns/iter (+/- 1)
test seq_10000_std       ... bench:         125 ns/iter (+/- 0)
test seq_100_apasel      ... bench:          69 ns/iter (+/- 0)
test seq_100_parent      ... bench:          46 ns/iter (+/- 3)
test seq_100_std         ... bench:          80 ns/iter (+/- 7)

To be honest, I'm not exactly sure what to make of this:

  • Most of the results look very promising, but, e.g., rand_10000 has a runtime significantly worse that std.
  • The results from @apasel422's branch are worse (in comparison to std) than what he was getting on his computer.
  • Many of the results for my branch and std got significantly worse after I implemented Drop, while @apasel422's results stayed pretty stable. This implies that for some reason std was benefitting from a memory leak.
  • My branch, due to the inherent safety issues of not using is_leaf flags, is significantly more compilated than @apasel422's implementation. As a quick check of just the syntactic overhead, my implementation has ~35% more lines than @apasel422's despite not implmenting iteration.

@gereeter
Copy link
Contributor

My WIP implementation is at https://github.com/gereeter/btree-rewrite.

@apasel422
Copy link
Contributor Author

@gereeter Perhaps we can combine forces to tackle this.

@gereeter
Copy link
Contributor

@apasel422 Yes, that would be a good idea. However, I'm not sure which approach should be used, and unfortunately I don't see any good way of combining our codebases because of all the mechanisms I use to control unsafety that simply aren't necessary in your codebase.

@arthurprs
Copy link
Contributor

Sorry to nag, but it looks like progress halted in all fronts. Is anyone still working on this?

@gereeter
Copy link
Contributor

gereeter commented Nov 9, 2015

Sorry for the stagnation. I was (and still am) fairly near completion, just needing to finish removal, but unfortunately schoolwork has been keeping me very busy and I haven't been thinking about this.

@arthurprs
Copy link
Contributor

@gereeter nice to hear that, did you and @apasel422 figure out the best model for the leaves?

@gereeter
Copy link
Contributor

My benchmarks seemed to rule in favor of not using is_leaf flags and just tracking what depth you are at.

bors added a commit that referenced this issue Jan 17, 2016
Despite being over 700 lines shorter, this implementation should use less memory than the previous one and is faster on at least insertions and iteration, the latter improving approximately 5x.

Technically a [breaking-change] due to removal of deprecated functions.

cc @apasel422 @gankro @goyox86

Fixes #27865.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/rust-lang/rust/30426)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants