Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose sorbus as green tree impl #69

Closed
wants to merge 1 commit into from

Conversation

CAD97
Copy link
Collaborator

@CAD97 CAD97 commented Jul 3, 2020

Because I can't help myself, apparently. Currently requires nightly and a local version of sorbus.

This is a version of the sorbus patch that just directly exposes sorbus as the green tree implementation, rather than trying to encapsulate it in #58. That explains the much larger line diff.

Again, the main difference is the change from using GreenNode/&GreenNode to using Arc<GreenNode>/ArcBorrow<'_, GreenNode>.

In order to support the strict separation required by the orphan rules, SyntaxElement is now its own enum rather than a typedef around NodeOrToken. This can, of course, be avoided if we just inline sorbus rather than using it as a separate crate (but I find the separation useful, tbh).

I think the "expose sorbus" approach is better than the "encapsulate sorbus" approach taken in #58, because the encapsulation is a lot of requisite unsafe type casting for little specific gain.

Does not add a SyntaxPtr type yet like the one provided in #58, but it can be implemented equivalently.

Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to read sorbus more closely, but 👍 on the separation. I do think that immutable tree and thread-local zipper can and should live separately.

I guess it's stupid to ask this questions before reading sorbus source, but here are the things which are unsatisfactrory in current rowan:

  • perf (never can get enough of it :-) )
  • O(n) performance of SyntaxPtr
  • no support for lazy parsing (this is a big one, we'd want to implement skipping function bodies some time soon in rust-analyzer)


[features]
serde1 = [ "serde", "text-size/serde" ]
serde1 = [ "serde", "sorbus/ser" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice feature name!

@CAD97
Copy link
Collaborator Author

CAD97 commented Jul 3, 2020

  • SyntaxPtr

This is one of the primary benefits brought by using sorbus. Sorbus stores the offset of each child from its parent (in the parent's children array), so finding the child covering an offset is a binary search O(log w) rather than O(w). This means going from an offset to the node at that offset can be O(d log w) rather than O(d w) (depth d, node width w).

If you wanted to phrase these bounds in just n, then it'd probably (probabilistically) be current O(log n × log n) and sorbus O(log n × log log n). Here's a WolframAlpha expression to compare the behavior of these functions: Plot[{Log[x], Log[x] Log[x], Log[x] Log[Log[x]]}, {x, 1, 1000}]

  • perf

Needs measuring. Notable theoretical improvements that sorbus provides include:

  • (primary) No more double-indirection with &rowan::GreenNode, which is equivalent to &Arc<sorbus::green::Node>. ArcBorrow<'_, sorbus::green::Node> is used as a +0 way to pass the Arc pointer around without paying the reference counting cost.
  • (primary) Extra information stored for SyntaxPtr so that it can be O(d log w) rather than O(d w).
  • (primary) Cache deduplication is always O(w) and not O(n).
  • Nodes are only constructable via a builder cache, strongly encouraging more deduplication of nodes. Rowan can still of course provide a thread-local cache for node construction on top of sorbus's explicit builder cache if it wants. This also gives a local place to experiment with more interesting cache strategies.
  • Cache is zero-alloc on cache hits. This is easily also done in rowan's current impl using the same strategy. Requires Rust 1.46.
  • Memory usage I expect to come out to about the same overall, but storing more information (child offsets from parent). The number of allocations is one fewer for every SmolStr that previously allocated.

lazy parsing

I have a plan for that, using arc-swap. I just need to make sure that the design won't end up breaking all of the places we give out pointers with "interesting" ownership semantics. If we do want/need to add a third node type as a "thunk", sorbus is set up better to accommodate it than rowan's current design.

sorbus/ser

I even have a sorbus/deserialization feature with proper deduplication of nodes and full support of compact binary formats as well! It was fun learning how to use serde's DeserializeSeed interface.


I think at this point it's probably about time to move cad97/sorbus to rust-analyzer/sorbus. (I've invited you to cad97/sorbus so you can do as much.) If you want to set up a sync time to go over the design and internals of sorbus, please do ping me on Zulip.

@CAD97
Copy link
Collaborator Author

CAD97 commented Jul 29, 2021

Closing as (heavily) outdated. I'm still somewhat interested in exploring the design space some more, but it's not important.

@CAD97 CAD97 closed this Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants