Expose sorbus as green tree impl #69

CAD97 · 2020-07-03T02:25:23Z

Because I can't help myself, apparently. Currently requires nightly and a local version of sorbus.

This is a version of the sorbus patch that just directly exposes sorbus as the green tree implementation, rather than trying to encapsulate it in #58. That explains the much larger line diff.

Again, the main difference is the change from using GreenNode/&GreenNode to using Arc<GreenNode>/ArcBorrow<'_, GreenNode>.

In order to support the strict separation required by the orphan rules, SyntaxElement is now its own enum rather than a typedef around NodeOrToken. This can, of course, be avoided if we just inline sorbus rather than using it as a separate crate (but I find the separation useful, tbh).

I think the "expose sorbus" approach is better than the "encapsulate sorbus" approach taken in #58, because the encapsulation is a lot of requisite unsafe type casting for little specific gain.

Does not add a SyntaxPtr type yet like the one provided in #58, but it can be implemented equivalently.

matklad

I need to read sorbus more closely, but 👍 on the separation. I do think that immutable tree and thread-local zipper can and should live separately.

I guess it's stupid to ask this questions before reading sorbus source, but here are the things which are unsatisfactrory in current rowan:

perf (never can get enough of it :-) )
O(n) performance of SyntaxPtr
no support for lazy parsing (this is a big one, we'd want to implement skipping function bodies some time soon in rust-analyzer)

matklad · 2020-07-03T18:00:33Z

Cargo.toml


 [features]
-serde1 = [ "serde", "text-size/serde" ]
+serde1 = [ "serde", "sorbus/ser" ]


👍 nice feature name!

CAD97 · 2020-07-03T19:57:35Z

SyntaxPtr

This is one of the primary benefits brought by using sorbus. Sorbus stores the offset of each child from its parent (in the parent's children array), so finding the child covering an offset is a binary search O(log w) rather than O(w). This means going from an offset to the node at that offset can be O(d log w) rather than O(d w) (depth d, node width w).

If you wanted to phrase these bounds in just n, then it'd probably (probabilistically) be current O(log n × log n) and sorbus O(log n × log log n). Here's a WolframAlpha expression to compare the behavior of these functions: Plot[{Log[x], Log[x] Log[x], Log[x] Log[Log[x]]}, {x, 1, 1000}]

perf

Needs measuring. Notable theoretical improvements that sorbus provides include:

(primary) No more double-indirection with &rowan::GreenNode, which is equivalent to &Arc<sorbus::green::Node>. ArcBorrow<'_, sorbus::green::Node> is used as a +0 way to pass the Arc pointer around without paying the reference counting cost.
(primary) Extra information stored for SyntaxPtr so that it can be O(d log w) rather than O(d w).
(primary) Cache deduplication is always O(w) and not O(n).
Nodes are only constructable via a builder cache, strongly encouraging more deduplication of nodes. Rowan can still of course provide a thread-local cache for node construction on top of sorbus's explicit builder cache if it wants. This also gives a local place to experiment with more interesting cache strategies.
Cache is zero-alloc on cache hits. This is easily also done in rowan's current impl using the same strategy. Requires Rust 1.46.
Memory usage I expect to come out to about the same overall, but storing more information (child offsets from parent). The number of allocations is one fewer for every SmolStr that previously allocated.

lazy parsing

I have a plan for that, using arc-swap. I just need to make sure that the design won't end up breaking all of the places we give out pointers with "interesting" ownership semantics. If we do want/need to add a third node type as a "thunk", sorbus is set up better to accommodate it than rowan's current design.

sorbus/ser

I even have a sorbus/deserialization feature with proper deduplication of nodes and full support of compact binary formats as well! It was fun learning how to use serde's DeserializeSeed interface.

I think at this point it's probably about time to move cad97/sorbus to rust-analyzer/sorbus. (I've invited you to cad97/sorbus so you can do as much.) If you want to set up a sync time to go over the design and internals of sorbus, please do ping me on Zulip.

CAD97 · 2021-07-29T21:11:45Z

Closing as (heavily) outdated. I'm still somewhat interested in exploring the design space some more, but it's not important.

matklad reviewed Jul 3, 2020

View reviewed changes

Cargo.toml

[features]

serde1 = [ "serde", "text-size/serde" ]

serde1 = [ "serde", "sorbus/ser" ]

Copy link

Member

matklad Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice feature name!

CAD97 mentioned this pull request Jul 5, 2020

Checklist for use in rust-analyzer/rowan CAD97/sorbus#60

Open

13 tasks

Expose sorbus as green tree impl

58c994c

CAD97 force-pushed the expose-sorbus branch from a9df4d4 to 58c994c Compare July 16, 2020 01:04

matklad force-pushed the master branch from 7e88dbf to 866a4e2 Compare January 16, 2021 18:07

CAD97 closed this Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose sorbus as green tree impl #69

Expose sorbus as green tree impl #69

CAD97 commented Jul 3, 2020 •

edited

Loading

matklad left a comment

matklad Jul 3, 2020

CAD97 commented Jul 3, 2020 •

edited

Loading

CAD97 commented Jul 29, 2021

Expose sorbus as green tree impl #69

Expose sorbus as green tree impl #69

Conversation

CAD97 commented Jul 3, 2020 • edited Loading

matklad left a comment

Choose a reason for hiding this comment

matklad Jul 3, 2020

Choose a reason for hiding this comment

CAD97 commented Jul 3, 2020 • edited Loading

CAD97 commented Jul 29, 2021

CAD97 commented Jul 3, 2020 •

edited

Loading

CAD97 commented Jul 3, 2020 •

edited

Loading