Consider using succinct data structures for read-only trees #107

matklad · 2021-07-03T13:09:22Z

I've recently been reading on succinct data structures literature, and I am wondering if they might be applicable to rowan. The tagline of SD is to represent data using approximately as few bits as entropy allows, but with keeping all the operations fast.

In particular, for trees, we generally use 2 * n * sizeof<usize> bytes (parent/children pointers). SD allows using roughly 2n bits, while still allowing for efficient parent/child queries. This works due to the following tricks:

bitvectors can be represented very efficiently (you can pack 64 bits into a single word)
working with bitmasks is fast (popcount + precomputed tables)
by spending a few extra bytes, it's possible to augment bitvector with fast prefix-sum datastructures (rank & select)
it's possible to encode a tree as a bitvec, using couple of bits per node, where parent/child relations are expressible via rank/select

It would be interesting to see if it makes sense to use something like this for rowan.

Some reasons to do this:

using fewer bits for storing tree topology means that larger trees will fit into CPU caches, potentially improving performance
trees are heavy-weight, reducing RAM seems like a good idea
algorithmic complexity of operations becomes independent of the topology of the tree. Ie, the tree can be arbitrary deep or arbitrary wide

Some reasons not to do this:

performance will probably be worse
works only for read-only trees
high complexity of the implementation

The text was updated successfully, but these errors were encountered:

CAD97 · 2021-08-15T21:15:44Z

Rowan is such nerd catnip for me.

https://doi.org/10.1145/2601073

Fully Functional Static and Dynamic Succinct Trees

For the dynamic case, where insertion/deletion (indels) of nodes is allowed, the existing data structures support a very limited set of operations. Our data structure builds on the range min-max tree to achieve 2n + O(n/log n) bits of space and O(log n) time for all operations supported in the static scenario, plus indels. We also propose an improved data structure using 2n + O(nlog log n/log n) bits and improving the time to the optimal O(log n/log log n) for most operations. We extend our support to forests, where whole subtrees can be attached to or detached from others, in time O(log^1+ϵ n) for any ϵ > 0. Such operations had not been considered before.

The forest operations seem especially relevant to rust-analyzer/rowan use cases, since the main mutation use case is in grafting subtrees. It's definitely worth looking into... and I'm going to think about this until I experiment some with it... I sense a sorbus 0.2 coming some time in the future 😅

An interesting constraint that Rowan has over pure academic trees is the actual data storage on each node (in a side table for succinct data structures, typically, AIUI). Both interior nodes (for length offsets of children, for sublinear position search) and terminal nodes (for the actual text) want to themselves be dynamically sized, so size analysis is obviously more complicated, along with node deduplication.

CAD97 · 2021-08-15T21:32:21Z

One property of Rowan that I'm fairly sure sure succinct trees would lose is the independence of nodes. Currently, if you have GreenNode, you keep just that node and its children alive. With a succinct tree, however, I'm fairly certain that the tree is one unit w.r.t. ownership. (Though I need to dig into how exactly the forest operations work.)

Is this restriction problematic for what rust-analyzer wants? Node payloads would still be able to be cached and deduplicated separately from the trees themselves.

If we're okay giving up partial sub-ownership of a green tree, we can reduce memory usage even without a succinct tree by using indices into an arena rather than pointers (as well as unlocking parent pointers in the green tree, if we want those).

davidbarsky mentioned this issue Jun 24, 2024

A Plan for Making Rust Analyzer Faster rust-lang/rust-analyzer#17491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using succinct data structures for read-only trees #107

Consider using succinct data structures for read-only trees #107

matklad commented Jul 3, 2021

CAD97 commented Aug 15, 2021

Fully Functional Static and Dynamic Succinct Trees

CAD97 commented Aug 15, 2021

Consider using succinct data structures for read-only trees #107

Consider using succinct data structures for read-only trees #107

Comments

matklad commented Jul 3, 2021

CAD97 commented Aug 15, 2021

Fully Functional Static and Dynamic Succinct Trees

CAD97 commented Aug 15, 2021