Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have seen Perf/HalfPath state db key #6331 reduces database growth by about 50% likely due to improved compression. Branch that change only one child that sits next to previously unchanged branch probably get compressed really well.
BUT WAIT THERE'S MORE!
Since different path no longer share hashes we can approximately keep track of what is currently in the database, and on persist of a canonical block, we remove the old value.
This work suprisingly well, where 90% (95% for state, 82% for storage) of the about to be persisted key have a match in the LRU for tracking persisted node.
This does require additional memory with an LRU of 2 million nodes. This should take about 2Mil * 48 byte of memory at best case, with an additional 0.5Mil * 48 byte to keep track of recommitted nodes to prevent them from being deleted. Only path of nibble length <= 14 is tracked to save space.
Note, this does mean that for storage it will need to keep the full 32 byte address for the key, to prevent two storage from accidentally sharing the same (key, hash) tuple, which is unlikely but a vector for attack. This increase disk space use by about 2GB. Block processing time is much harder to notice any difference, but there could be other overhead that is not obvious right now.
In a test run, running blocks from Oct 9 to Oct 21 the database does grow slower but only by half.
On a longer run, with block spanning from 9 Oct to about 20 Jan (about 3 month), state db size grew from 143GB to 164GB.
You can also see that the delete command saturated enough of the database after about 2 week.
Halfpath without node removal grew from 143GB to 262GB.
On master (hash layout), state db grew from 174GB to 428GB.
Changes
Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
Optional. Remove if not applicable.
Documentation
Requires documentation update
If yes, link the PR to the docs update or the issue with the details labeled
docs
. Remove if not applicable.Requires explanation in Release Notes
If yes, fill in the details here. Remove if not applicable.
Remarks
Optional. Remove if not applicable.