Change `BTreeMap` to use parent pointers #27865

apasel422 · 2015-08-17T16:47:00Z

This will require nodes to have a stable address.

Gankra · 2015-08-17T16:51:54Z

All nodes are boxed, so this is trivially true?

apasel422 · 2015-08-17T16:53:48Z

BTreeMap directly contains a Node, and actually no nodes are boxed. They have fields for the allocated keys/vals/edges instead of being boxed themselves.

apasel422 · 2015-08-17T16:57:22Z

Well, the root node definitely isn't boxed. I guess the non-root ones do have a stable address, though, because their parent nodes' allocations never change (as far as I can tell).

Gankra · 2015-08-17T17:12:19Z

Oh whoops, yes you're right. I forgot that we did that.

I fear that without ManuallyDrop we're going to have to go Full HashMap and manage all of our allocations/destructing.

Gankra · 2015-08-17T17:16:54Z

Oh wait no we're 90% of the way to Full HashMap anyway. I'm remembering the good ol' days...

Gankra · 2015-08-17T17:49:22Z

Auuuugh all of this is coming back to me.

Basically it seems to me that you "want"

Node {
  len: u8,
  parent_idx: u8,
  is_leaf: bool,
  parent: *mut Node,
  keys: ManuallyDrop<[K; n]>
  vals: ManuallyDrop<[V; n]>
  edges: ManuallyDrop<[Box<Node>; n + 1]> 
}

Map {
  root: Option<Box<Node>>
}

But this design forbids you from not allocating space for your edges in your leaves. This is why we currently have (effectively):

Node {
    len: u8,
    // Actually just Uniques that we manually allocate and manage
    keys: Box<ManuallyDrop<[K; n]>>,
    vals: Box<ManuallyDrop<[V; n]>>,
    edges: Option<Box<ManuallyDrop<[Node; n + 1]>>>,
}

Which enables the edges to not be allocated (doubling as is_leaf). We then also abuse this fact to keep the root inline, and avoid an extra allocation/indirection.

We could also consider a hybrid:

Node {
    len: u8,
    keys: [K; n],
    vals: [V; n],
    edges: Option<Box<ManuallyDrop<[Node; n + 1]>>>,
}

But this has the negative side-effect of making the edges array enormous.

I think google's btree does the following:

struct BaseNode {
   is_leaf: bool,
   keys: ManuallyDrop<[K; n]>
   vals: ManuallyDrop<[V; n]>
   // and other common stuff like parent pointers
}

// Inherit from BaseNode
struct InternalNode: BaseNode {
   edges: ManuallyDrop<[Unique<BaseNode>; n+1]>
}

And then based on the value of is_leaf, may cast the BaseNode to an InternalNode.

This works because a Leaf never becomes Internal or vice-versa. So leaves only allocate the space they need, and internal nodes allocate more but get type-erased to "only" leaves. And then blah blah figure out deallocation yourself whatever.

apasel422 · 2015-08-17T19:45:33Z

The type-erasure approach seems best, but as I understand it, it's technically undefined behavior for us to transmute between the two struct types, right?

Gankra · 2015-08-17T20:42:12Z

Not if they're repr(C) and one is a prefix of the other. You would be transmuting the pointers, of course.

apasel422 · 2015-08-17T21:39:19Z

OK, I got a prototype working with type-erasure and parent-pointer insertion, though it's not using the existing BTree code. I need to play around with it more in isolation before attempting a PR.

apasel422 · 2015-08-18T14:42:05Z

How important is it to keep the ability to specify b? It seems like we're waiting for type-level integers (or associated constants) to land before stabilizing that functionality anyway.

apasel422 · 2015-08-18T18:18:29Z

I'm leaning toward the following representation:

const T: usize = 6;

pub struct BTreeMap<K, V> {
    root: Option<Box<Node<K, V>>>,
    ...
}

#[repr(C)]
struct Node<K, V> {
    keys: [K; 2 * T - 1],
    vals: [V: 2 * T - 1],
    parent: *mut InternalNode<K, V>,
    len: u16,
    idx: u16,
    is_leaf: bool,
}

#[repr(C)]
struct InternalNode<K, V> {
    node: Node<K, V>,
    edges: [*mut Node<K, V>; 2 * T],
}

We don't need ManuallyDrop right now, because we can just take the root and mem::forget the node after running the key and value destructors and recursively dropping any edges. We should be able to easily replace the T const with an integer parameter when those are ready.

Gankra · 2015-08-18T18:26:42Z

Being able to specify B is totally unimportant and is slated to be deprecated.

Also this layout wastes an extra ptr of space over using a u8 for len and idx on 32-bit (no effect on 64bit).

As a minor bikeshed I'd maybe call idx "parent_idx" or whatever.

Gankra · 2015-08-18T18:28:16Z

Oh but otherwise 👍 this is basically what I had in mind.

apasel422 · 2015-08-18T18:30:20Z

I was aware of the waste, but unsure whether we wanted to support B > 255. u8 should suffice, though. (In a future Rust, we might even be able to determine how big to make {len, parent_idx} based on B.)

What do we have to be aware of with zero-sized types here?

Gankra · 2015-08-18T18:41:33Z

Zero-sized types should Just Work in this case. Particularly the parent pointer ensures a nice minimum alignment.

apasel422 · 2015-08-19T14:14:59Z

Here are some preliminary benchmarks for just insertion:

test rand_100000_parent ... bench:         172 ns/iter (+/- 9)
test rand_100000_std    ... bench:         208 ns/iter (+/- 14)
test rand_10000_parent  ... bench:         109 ns/iter (+/- 8)
test rand_10000_std     ... bench:         146 ns/iter (+/- 8)
test rand_100_parent    ... bench:          63 ns/iter (+/- 10)
test rand_100_std       ... bench:          84 ns/iter (+/- 17)
test seq_100000_parent  ... bench:          73 ns/iter (+/- 5)
test seq_100000_std     ... bench:         115 ns/iter (+/- 7)
test seq_10000_parent   ... bench:          58 ns/iter (+/- 3)
test seq_10000_std      ... bench:          84 ns/iter (+/- 14)
test seq_100_parent     ... bench:          21 ns/iter (+/- 2)
test seq_100_std        ... bench:          44 ns/iter (+/- 5)

Gankra · 2015-08-19T15:42:14Z

Fantastic!!!

apasel422 · 2015-08-19T15:44:35Z

The problem is patching the new stuff into the existing code; it's likely easier to just rewrite the entire module than change things piece by piece.

Gankra · 2015-08-19T16:29:32Z

I have no problem with that, as long as you're down to do it.

arthurprs · 2015-08-20T20:48:21Z

Woa, impressive gains!

apasel422 · 2015-08-21T01:32:20Z

I will be implementing this as a standalone library before attempting a patch: https://github.com/apasel422/btree (currently empty)

gereeter · 2015-08-21T16:33:41Z

An alternative to the #[repr(C)] trick would be to store the is_leaf information a level up, with the edges, as follows:

const T: usize = 6;

pub struct BTreeMap<K, V> {
    root: Option<Box<Node<K, V>>>,
    ...
}

struct Node<K, V> {
    keys: [K; 2 * T - 1],
    vals: [V: 2 * T - 1],
    parent: *mut InternalNode<K, V>,
    len: u16,
    idx: u16,
}

struct InternalNode<K, V> {
    node: Node<K, V>,
    edges: EdgeData<K, V>,
}

enum EdgeData<K, V> {
    Leaves([Box<Node<K, V>>; 2 * T]),
    Internals([Box<InternalNode<K, V>>; 2 * T]),
}

eddyb · 2015-08-21T17:07:16Z

@gereeter But can't each of the sub-nodes be either a leaf or an internal node?
One nice trick here would be to use the lowest bit to indicate whether a node is internal or a leaf, though you can't do it in entirely safe code just yet (it's not even an optimization you can do without opting out of being able to take references to a Box<Node> field).

apasel422 · 2015-08-21T17:40:34Z

@gereeter In your example, given an &Node<K, V>, how do you know if it's internal or a leaf?

Gankra · 2015-08-21T18:12:08Z

@eddyb does that actually accomplish anything? We have a bunch of extra bytes in the node struct anyway.

bluss · 2015-08-21T18:19:47Z

@eddyb All leaves are at the same depth in a b-tree

gereeter · 2015-08-21T18:20:03Z

@apasel422 You can't figure it out. However, you could have NodeRef types that keep track of that information.

apasel422 · 2015-08-21T18:22:30Z

@gereeter What I mean is, how would you use that to perform an insertion? The root in your example is a Node.

gereeter · 2015-08-21T18:23:31Z

@apasel422 Oh, whoops. You would need to do the same sort of branching at the root instead of just storing a Node.

gereeter · 2015-08-21T18:32:30Z

Note: I think that the ideal representation, in pseudo-Rust, is the following.

struct BTreeMap<K, V> {
    height: usize,
    root: Option<Box<Node<K, V, height>>>
}

struct Node<K, V, height: usize> {
    keys: [K; 2 * T - 1],
    vals: [V; 2 * T - 1],
    edges: if height > 0 {
        [Box<Node<K, V, height - 1>>; 2 * T]
    } else { () }
    parent: *mut Node<K, V, height + 1>,
    len: u16,
    idx: u16,
}

Essentially, we don't store is_leaf information anywhere except for the root and simply infer this information everwhere from the root. I'm trying to write an implementation that uses this representation, but it is difficult to keep the unsafety under control.

apasel422 · 2015-08-21T18:45:46Z

@gereeter Is it possible that DSTs could help there, since nodes will have to be behind a pointer anyway?

gereeter · 2015-08-21T18:50:57Z

@apasel422 You could use edges: [Box<Node<K, V>>] and check the length of that array to see if you are in a leaf or not. However, I'm not sure that this is the greatest idea, as it uses a whole usize-worth of space for every is_leaf flag. Additionally, it would make allocation a pain. With the current state of DSTs, I don't see a way to use them to actually reduce the amount of extra space used.

apasel422 · 2015-08-21T22:32:43Z

Preliminary forward by-ref iteration benches:

test iter_100000_parent ... bench:     381,227 ns/iter (+/- 24,162)
test iter_100000_std    ... bench:   1,246,445 ns/iter (+/- 24,031)
test iter_1000_parent   ... bench:       2,950 ns/iter (+/- 612)
test iter_1000_std      ... bench:      12,452 ns/iter (+/- 1,868)
test iter_20_parent     ... bench:          54 ns/iter (+/- 14)
test iter_20_std        ... bench:         284 ns/iter (+/- 23)

arthurprs · 2015-08-21T23:33:33Z

Huge gains!

gereeter · 2015-08-23T18:50:30Z

Preliminary results from my code that doesn't use is_leaf flags:

test rand_1000000_apasel ... bench:         425 ns/iter (+/- 31)
test rand_1000000_parent ... bench:         343 ns/iter (+/- 29)
test rand_1000000_std    ... bench:         472 ns/iter (+/- 31)
test rand_100000_apasel  ... bench:         260 ns/iter (+/- 11)
test rand_100000_parent  ... bench:         282 ns/iter (+/- 21)
test rand_100000_std     ... bench:         298 ns/iter (+/- 22)
test rand_10000_apasel   ... bench:         196 ns/iter (+/- 9)
test rand_10000_parent   ... bench:         242 ns/iter (+/- 9)
test rand_10000_std      ... bench:         210 ns/iter (+/- 9)
test rand_100_apasel     ... bench:         108 ns/iter (+/- 7)
test rand_100_parent     ... bench:          99 ns/iter (+/- 6)
test rand_100_std        ... bench:         121 ns/iter (+/- 11)
test seq_1000000_apasel  ... bench:         181 ns/iter (+/- 4)
test seq_1000000_parent  ... bench:         168 ns/iter (+/- 8)
test seq_1000000_std     ... bench:         205 ns/iter (+/- 26)
test seq_100000_apasel   ... bench:         174 ns/iter (+/- 0)
test seq_100000_parent   ... bench:         174 ns/iter (+/- 5)
test seq_100000_std      ... bench:         177 ns/iter (+/- 8)
test seq_10000_apasel    ... bench:         121 ns/iter (+/- 0)
test seq_10000_parent    ... bench:          91 ns/iter (+/- 1)
test seq_10000_std       ... bench:         125 ns/iter (+/- 0)
test seq_100_apasel      ... bench:          69 ns/iter (+/- 0)
test seq_100_parent      ... bench:          46 ns/iter (+/- 3)
test seq_100_std         ... bench:          80 ns/iter (+/- 7)

To be honest, I'm not exactly sure what to make of this:

Most of the results look very promising, but, e.g., rand_10000 has a runtime significantly worse that std.
The results from @apasel422's branch are worse (in comparison to std) than what he was getting on his computer.
Many of the results for my branch and std got significantly worse after I implemented Drop, while @apasel422's results stayed pretty stable. This implies that for some reason std was benefitting from a memory leak.
My branch, due to the inherent safety issues of not using is_leaf flags, is significantly more compilated than @apasel422's implementation. As a quick check of just the syntactic overhead, my implementation has ~35% more lines than @apasel422's despite not implmenting iteration.

gereeter · 2015-08-23T18:51:56Z

My WIP implementation is at https://github.com/gereeter/btree-rewrite.

apasel422 · 2015-08-24T21:21:39Z

@gereeter Perhaps we can combine forces to tackle this.

gereeter · 2015-08-26T22:20:43Z

@apasel422 Yes, that would be a good idea. However, I'm not sure which approach should be used, and unfortunately I don't see any good way of combining our codebases because of all the mechanisms I use to control unsafety that simply aren't necessary in your codebase.

arthurprs · 2015-11-09T16:27:39Z

Sorry to nag, but it looks like progress halted in all fronts. Is anyone still working on this?

gereeter · 2015-11-09T16:53:50Z

Sorry for the stagnation. I was (and still am) fairly near completion, just needing to finish removal, but unfortunately schoolwork has been keeping me very busy and I haven't been thinking about this.

arthurprs · 2015-11-09T17:17:46Z

@gereeter nice to hear that, did you and @apasel422 figure out the best model for the leaves?

gereeter · 2015-11-11T16:36:20Z

My benchmarks seemed to rule in favor of not using is_leaf flags and just tracking what depth you are at.

@apasel422

Despite being over 700 lines shorter, this implementation should use less memory than the previous one and is faster on at least insertions and iteration, the latter improving approximately 5x. Technically a [breaking-change] due to removal of deprecated functions. cc @apasel422 @gankro @goyox86 Fixes #27865.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/rust-lang/rust/30426)

steveklabnik added the A-libs label Aug 17, 2015

gereeter mentioned this issue Dec 17, 2015

Rewrite BTreeMap to use parent pointers. #30426

Merged

bors closed this as completed in #30426 Jan 17, 2016

Change BTreeMap to use parent pointers #27865

Change BTreeMap to use parent pointers #27865

Comments

apasel422 commented Aug 17, 2015

Gankra commented Aug 17, 2015

apasel422 commented Aug 17, 2015

apasel422 commented Aug 17, 2015

Gankra commented Aug 17, 2015

Gankra commented Aug 17, 2015

Gankra commented Aug 17, 2015

apasel422 commented Aug 17, 2015

Gankra commented Aug 17, 2015

apasel422 commented Aug 17, 2015

apasel422 commented Aug 18, 2015

apasel422 commented Aug 18, 2015

Gankra commented Aug 18, 2015

Gankra commented Aug 18, 2015

apasel422 commented Aug 18, 2015

Gankra commented Aug 18, 2015

apasel422 commented Aug 19, 2015

Gankra commented Aug 19, 2015

apasel422 commented Aug 19, 2015

Gankra commented Aug 19, 2015

arthurprs commented Aug 20, 2015

apasel422 commented Aug 21, 2015

gereeter commented Aug 21, 2015

eddyb commented Aug 21, 2015

apasel422 commented Aug 21, 2015

Gankra commented Aug 21, 2015

bluss commented Aug 21, 2015

gereeter commented Aug 21, 2015

apasel422 commented Aug 21, 2015

gereeter commented Aug 21, 2015

gereeter commented Aug 21, 2015

apasel422 commented Aug 21, 2015

gereeter commented Aug 21, 2015

apasel422 commented Aug 21, 2015

arthurprs commented Aug 21, 2015

gereeter commented Aug 23, 2015

gereeter commented Aug 23, 2015

apasel422 commented Aug 24, 2015

gereeter commented Aug 26, 2015

arthurprs commented Nov 9, 2015

gereeter commented Nov 9, 2015

arthurprs commented Nov 9, 2015

gereeter commented Nov 11, 2015

Change `BTreeMap` to use parent pointers #27865

Change `BTreeMap` to use parent pointers #27865