[ReshardingV3] State Witness - new state root generation #12074

Longarithm · 2024-09-11T00:08:53Z

We agreed that the nicest approach is to have the same code handling both resharding proof generation and verifying, similarly to trie read-write operations.
For now we describe only memtrie logic. Partial trie logic is TODO.

We will implement function against trie storage:

GenericTrie::cut(state_root, boundary_account, retain: Left/Right) -> (new_state_root, proof)

which main logic will be

GenericTrie::delete_multi_range(state_root, intervals: { [left_key, right_key) }) -> (new_state_root, proof)

cut

Simulates cutting out all trie keys by boundary_account, retaining left or right part based on retain parameter. To process resharding, if node tracks the left child, it needs to call this function for retain = Left to get new state root with proof, and vice versa for the right child.

Because trie keys start from different trie type bytes, we must execute several range removals.
For each trie key type t which includes account id (except PromiseYield), we remove the interval:

for retain = Left: [ [t] + boundary_account, [t + 1] )
for retain = Right: [ [t], [t] + boundary_account )

If trie key doesn't have account id, we do nothing. This is because we fully copy delayed, buffered and promise-yield receipts and discard them later.
We supply the resulting list of ranges for removal to the function below.

delete_multi_range

Recursively descends into children, removing all keys falling into intervals. Returns new state root and proof of deletion.
TODO: because the trie access logic is complex, I can't describe it using existing structs. For now, I will use only memtrie. But probably MemTrieUpdate can be reused.

delete_multi_range calls delete_multi_range_recursive(state_root: MemTrieNodeId, current_key: Vec<u8>, intervals, accesses: &mut TrieAccesses). This method will probably return MemTrieChanges. This itself will be given to apply_memtrie_changes to generate new state root. Initially current_key = vec![], accesses are empty.
TODO: not sure if MemTrieChanges fits here well.

delete_multi_range_recursive:

We need to make decision what to do with current range of keys. First, we decide separately on each interval [L, R) against current key. There are 3 decisions we can make.

DeleteAll - remove subtree completely. Leave Empty node.
Descend - descend into all children recursively and then recompute the node.
Skip - leave node as is. Nothing to be removed.

Note that subtree of key can contain only strings resulting by appending new symbols to key.

If key < L:
- If L starts with key, do Descend - because it is possible to append new symbols so they fall into [L, R). Example: key = 92, L = 92c. key < L, but its subtree may contain 92c, 92c0, etc - strings which fall into [L, R).
- Otherwise, do Skip. Whatever we append to key, the result will be still less than L.
If L <= key < R:
- If R starts with key, do Descend - because appending new symbols may give strings which are >= R. Example: key = d8, R = d823. Subtree of key may contain d85 which is > R.
- Otherwise, do DeleteAll. Whatever we append to key, the result will stay in range.
If key >= R:
- Skip.

The final decision is as follows:

if there is at least one DeleteAll - it is DeleteAll
otherwise, if there is at least one Descend - do Descend
otherwise Skip.

In the case of Descend, we visit all children, depending on the node type, collect resulting nodes and compute new node (possibly compress Branch to Extension).

For each node read we put it to TrieAccesses. These nodes form a proof. (Check: we don't need to read values, right?)

Note: looks like proof is essentially all paths to the intervals' boundaries. But I don't see a simple way to define delete_multi_range in terms of existing trie queries.

Next steps

Prototype for memtrie. Check that state root matches the one built from flat storage.
Same implementation for partial trie to verify generated proof.
Insert this logic into the right place of the block postprocessing logic.

Out of scope

Deduplicate memtrie & partial trie logic - for example, memtrie_lookup and lookup_from_state_column. It could be possible to unite these impls using templates. But for now memtrie is actively worked on by CR Team and after applied optimisations it may be hard to unite logic.

The text was updated successfully, but these errors were encountered:

Create the code flow intended for resharding state root creation. The ultimate goal will be the temporary memtries created, which will be used until new memtries are created from flat storage. This will be a multi-step work. To complete it, all mentioned TODOs must be resolved. ### Overview The core is a `MemTrieUpdate::retain_multi_range` function and a simple `test_retain_multi_range` test. It follows the logic in #12074. I plan to enhance logic sequentially until it will be fit for the actual resharding. Drawback is that code is not used on chain, but setting up full resharding test requires separate work. I suggest to start from the the incomplete flow; hopefully I will need ~5 PRs to set up the full test. It follows the "top-down" logic as in insert/delete functions: first we pull root node to in-memory update list, then we descend into children. In the end, we call `to_trie_changes` which calls `post_order_traverse_updated_nodes` to create full set of changes to be applied to trie. ### Alternative For the multi-range deletion, better logic is "bottom-up": we first understand the result of children subtrees, and then mark root node as updated if needed. It's not clear whether everything must be "bottom-up". It probably has its own challenges. But here it is strictly better because it automatically gives the post traversal order of updated nodes. We could compute hashes on the fly but it's less LoC to postpone it to reuse existing logic, so I just call `compute_hashes_and_serialized_nodes` in the end.

Next step on #12074. Supporting all cases I came up with where nodes restructuring is required. Surprisingly, `squash_node` is enough to call. I only needed to implement trivial cases which weren't possible for single key deletion. I implemented the tests first, and majority of them failed before changing the logic. Each test is comparing naive approach with `retain_multi_range`. ### Notes * I'm a bit scared that I didn't realise the need to squash Extension before. Fortunately, `squash_node` handles that, but if you feel some cases are not covered here, feel free to post suggestions! * Reused + copypasted some Robin' tooling to generate interesting nodes conversions. * Note that test for reading "extra" child node is not required because we always read all children. ### Next steps * A bit more testing * Similar logic for partial trie * Generating intervals needed for resharding * Use that to implement shard switch on chain

This was referenced Sep 11, 2024

[ReshardingV3] State Witness #12019

Open

feat: stub for instant resharding state root creation #12083

Merged

Longarithm mentioned this issue Sep 23, 2024

feat: full memtrie logic for range retain #12130

Merged

github-actions bot mentioned this issue Oct 1, 2024

Monthly issue metrics report #12175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ReshardingV3] State Witness - new state root generation #12074

[ReshardingV3] State Witness - new state root generation #12074

Longarithm commented Sep 11, 2024 •

edited

Loading

[ReshardingV3] State Witness - new state root generation #12074

[ReshardingV3] State Witness - new state root generation #12074

Comments

Longarithm commented Sep 11, 2024 • edited Loading

cut

delete_multi_range

Next steps

Out of scope

Longarithm commented Sep 11, 2024 •

edited

Loading