Use sorbus internally as the green tree implementation #58

CAD97 · 2020-04-27T01:58:25Z

This rejigs rowan's green tree implementation to use sorbus's.

This implementation completely encapsulates sorbus. (That's part of the complexity of the patch, as we need to convert between rowan::GreenNode and sorbus::green::Node on the heap, copyless.) This is due in part to an initial attempt not doing this showing multiple impls rowan does that would be incoherent if it were to just reexport the sorbus types as appropriate.

rc-borrow, however, is an added public dependency. This is because while rowan previously smashed the green node and token handles together by ensuring they never overlap, and then handed out effectively &Arc<ActualNode>, sorbus's node type is actually the type on the heap, and the owned handle is a normal Arc<Node>.

Basically, what was previously GreenNode is now Arc<GreenNode>, and what was previously &GreenNode is now ArcBorrow<'_, GreenNode>.

I verified that the tests and both examples run clean under miri (after increasing the recursion limit).

Draft until at the very least sorbus is published to crates.

sorbus/erasable are entirely encapsulated, and rc-borrow is a public dep.

CAD97 · 2020-04-28T17:48:26Z

This now exposes the "child at offset" functionality required to properly solve rust-lang/rust-analyzer#3934.

matklad · 2020-04-28T19:42:40Z

I am both excited to see this and a bit terrified review-time-wise :)

CAD97 · 2020-04-29T01:44:47Z

@matklad if you want to set up a sync time to just walk through sorbus and/or rowan's use of sorbus, I'm pretty much available anytime at the moment. (Probably best for me is afternoon/evening US Eastern, though.) Just ping me on Zulip and I'll probably see it.

CAD97 · 2020-04-29T02:20:16Z

experimental r-a patch: CAD97/rust-analyzer#2

matklad · 2020-04-29T06:31:22Z

experimental r-a patch:

What about perf-memory difference? It's probably to measure those using only parser/symbols, and not full-fledged analysis stats (thoguh global picture wouldn't hurt).

CAD97 · 2020-05-08T20:23:07Z

What about perf-memory difference?

I'm not sure how to test this (or if I even can on Windows).

I can draw diagrams for the tree layout, though, to estimate the memory usage difference. I'm only considering 64 bit targets here, for simplicity.

Current:

  Token                    TokenData
+----------------+       +--------------------+
| pointer |  0b1 |------>| ArcInner (2×usize) |
+----------------+       | SyntaxKind (u16)   |
    (arc style)          | SmolStr (3×usize)  |
                         +--------------------+

  Node                     NodeData
+----------------+       +--------------------+
| pointer & !0b1 |------>| ArcInner (2×usize) |
+----------------+       | N (usize)          |
    (arc style)          | SyntaxKind (u16)   |
                         | TextSize (u32)     |
                         | [PackedElement; N] |
                         +--------------------+

  PackedElement = union { Token, Node };

Each unique token: 48 bytes†.
Each unique node: 32 bytes, plus 8 bytes for each child edge.

† each unique token text that is not whitespace and >22 bytes requires an additional Arc<str>.

Sorbus:

  Arc<Token>             Token
+--------------+       +--------------------+
| thin pointer |------>| ArcInner (2×usize) |
+--------------+       | N (u32)            |
                       | SyntaxKind (u16)   |
                       | text: [u8; N]      |
                       +--------------------+

  Arc<Node>              Node
+--------------+       +--------------------+
| thin pointer |------>| ArcInner (2×usize) |
+--------------+       | N (u16)            |
                       | SyntaxKind (u16)   |
                       | TextSize (u32)     |
                       | [PackedChild; N]   |
                       +--------------------+

  PackedChild = union {
      (TextSize, PackedElement),
      (PackedElement, TextSize),
  };  (align u32, size 3×u32)

  PackedElement = union {
      Arc<Token> | 0b1,
      Arc<Node> & !0b1,
  };

Each unique token: 22 bytes, plus the size of the token text (rounded up to the nearest 2 mod 8).
Each unique node: 24 bytes, plus 12 bytes for each child edge† (rounded up to the nearest 0 mod 8).

† each child edge is carrying an additional 4 bytes of information over the current tree.

It's hard to say the exact expected memory usage just by looking at this, however:

A non-whitespace token with length that fits in a SmolStr will fit in the same space (or smaller).
A non-whitespace token with length that does not fit in a SmolStr will still only use a single allocation and allocate less space.
A whitespace token >26 bytes will take up more space (but still be deduplicated).
A node has a constant size decrease of 8 bytes.
Each node child edge takes 50% more space (but this is due to storing more information).

The key advantage, then, is probably the more principled design, rather than any specific space savings; I'd guess space to be about the same or slightly higher (but less higher than the binsearch branch).

And that more principled design means handing out actual &Node rather than &Arc<Node>, saving an indirection. So this probably will be felt more in performance of syntax tree APIs than actual memory usage, so I guess it does need someone who knows how to test it to test it.

Even if sorbus's approach turns out to be a bust, and the ArcBorrow usage doesn't turn out to be worth anything, I think I do want to get around to reimplementing the current implementation with pointer-utils/erasable/slice-dst rather than thin-dst, just because they provide a more elegant abstraction layer.

CAD97 · 2020-06-26T15:41:17Z

I'm going to close this.

I'll reopen again once Rust 1.46 is out and sorbus is on stable again. Alloc-free cache hits is a big deal that I saw fit to go ahead and merge into sorbus to prevent the impl there from bitrotting.

Probably when 1.46 hits beta I'll start back in on experiments. I'd like to be able to show that sorbus at the very least provides the same level of performance and memory usage patterns as current rowan, given that it's actually storing more information in the tree so that offset -> Node is a sublinear thing instead of a superlinear thing. (Current estimate: O(log² n) vs O(n log n), or more specifically O(d log w) vs O(d w) for d = (avg) node depth, w = (avg) node width.)

I'd also like to present two approaches: encapsulating sorbus and re-exporting sorbus. This PR took the approach of completely hiding sorbus from the outside world, but it would be interesting to also explore just re-exporting sorbus's types at the proper places. This PR took the encapsulated approach in part due to coherence rules, so that will again be an interesting experience, and probably inform whether sorbus stays green-only and Rowan is the syntax layer (cool!) or if sorbus needs to grow some syntax layer support of its own.

Once Rowan is using sorbus, I've got a concept for a single unified syntax tree (rather than the cursor/API split that currently exists) that I want to experiment with as well.

Anyway, that's my plans around Rowan/sorbus. And eventually I need to get back to the blog post series I was working on before getting sidetracked improving the tools it's built on...

CAD97 added 5 commits April 26, 2020 20:27

Implement green as a shim over sorbus

67f1d38

Update the rest of rowan to use the sorbus wrapping green tree

defb039

Be more explicit on public dependencies

89c7c6c

sorbus/erasable are entirely encapsulated, and rc-borrow is a public dep.

Fix Serialize impl

4038e51

Add SyntaxPtr type

49ff54d

CAD97 requested a review from matklad April 28, 2020 17:49

CAD97 added 2 commits April 28, 2020 21:20

⬆️ sorbus

a8a3f1c

Api needs to also give ArcBorrow<Node>

f39b846

CAD97 force-pushed the sorbus branch from abfc301 to f39b846 Compare April 29, 2020 01:44

kiljacken mentioned this pull request May 1, 2020

RA uses quite a lot of RAM rust-lang/rust-analyzer#1252

Closed

CAD97 mentioned this pull request May 18, 2020

Avoid allocating before we have checked if it is in the cache + don't rehash subtree #59

Closed

CAD97 closed this Jun 26, 2020

CAD97 mentioned this pull request Jul 3, 2020

Expose sorbus as green tree impl #69

Closed

davidbarsky mentioned this pull request Jun 24, 2024

A Plan for Making Rust Analyzer Faster rust-lang/rust-analyzer#17491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sorbus internally as the green tree implementation #58

Use sorbus internally as the green tree implementation #58

CAD97 commented Apr 27, 2020 •

edited

Loading

CAD97 commented Apr 28, 2020

matklad commented Apr 28, 2020

CAD97 commented Apr 29, 2020 •

edited

Loading

CAD97 commented Apr 29, 2020

matklad commented Apr 29, 2020

CAD97 commented May 8, 2020

CAD97 commented Jun 26, 2020

Use sorbus internally as the green tree implementation #58

Use sorbus internally as the green tree implementation #58

Conversation

CAD97 commented Apr 27, 2020 • edited Loading

CAD97 commented Apr 28, 2020

matklad commented Apr 28, 2020

CAD97 commented Apr 29, 2020 • edited Loading

CAD97 commented Apr 29, 2020

matklad commented Apr 29, 2020

CAD97 commented May 8, 2020

CAD97 commented Jun 26, 2020

CAD97 commented Apr 27, 2020 •

edited

Loading

CAD97 commented Apr 29, 2020 •

edited

Loading