Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: generic trie value updates #12344

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Longarithm
Copy link
Member

@Longarithm Longarithm commented Oct 30, 2024

Preparation step for #12324.

Introduce generic_store_value and generic_delete_value. These are different ways for MemTrie and TrieStorage to process value updates. TLDR: memtries can work with inlined values; trie storages must always have full values. Also the way to record accessed nodes is different.

Also, solving some issues on the way. For example, interfaces of Trie::insert and MemTrieUpdate::insert_impl currently diverge. The latter needs both FlatStateValue for memtrie changes and Option<Vec<u8>> for disk changes. There is a good reason for that - if memtrie is loaded from flat state, we don't need to produce disk trie changes, so it is enough to load FlatStateValues.

However, the current interface is quite loose, so it is a net improvement to make it more strict by requiring everyone to give ValueUpdate::MemtrieAndDisk or ::MemtrieOnly. The drawback is that Trie::insert technically can receive ValueUpdate::MemtrieOnly, it just will panic. But well... it still looks better than unclear memtrie interface with two values and hashes on the way.

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 84.00000% with 20 lines in your changes missing coverage. Please review.

Project coverage is 71.20%. Comparing base (daa67a3) to head (a273d14).
Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
core/store/src/trie/mem/updating.rs 89.10% 6 Missing and 5 partials ⚠️
core/store/src/trie/insert_delete.rs 66.66% 0 Missing and 4 partials ⚠️
core/store/src/trie/mem/loading.rs 0.00% 1 Missing and 1 partial ⚠️
core/store/src/trie/mod.rs 75.00% 0 Missing and 2 partials ⚠️
core/primitives/src/state.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #12344       +/-   ##
===========================================
+ Coverage   38.11%   71.20%   +33.09%     
===========================================
  Files         837      839        +2     
  Lines      168711   169781     +1070     
  Branches   168711   169781     +1070     
===========================================
+ Hits        64298   120891    +56593     
+ Misses     100525    43630    -56895     
- Partials     3888     5260     +1372     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (?)
db-migration 0.16% <0.00%> (?)
genesis-check 1.22% <0.00%> (?)
integration-tests 39.00% <72.00%> (+0.89%) ⬆️
linux 70.64% <84.00%> (+32.53%) ⬆️
linux-nightly 70.78% <84.00%> (?)
macos 50.43% <84.00%> (?)
pytests 1.53% <0.00%> (?)
sanity-checks 1.34% <0.00%> (?)
unittests 64.27% <84.00%> (?)
upgradability 0.21% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Trisfald Trisfald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

pub enum ValueToInsert {
/// Full value, works for all possible operations, triggers both memtrie
/// and disk changes generation.
Full(Vec<u8>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store the hash to avoid double computations, but I like more this data layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid double computation indeed! It can be significant (1us per hash)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to fix it here, see #12344 (comment)

}
// We change refcount only when we also make disk updates.
// In this case, we must have Full value.
fn add_refcount_to_value(&mut self, value: ValueToInsert) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the determination of whether to add refcount should be made based on a boolean that says "do I need disk updates", not whether the value to insert happens to be Full? I don't think the original interface was great either, but this kind of behavior feels implicit. Eh... but I guess I don't see a cleaner approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made naming more explicit: MemtrieAndDisk/MemtrieOnly

pub enum ValueToInsert {
/// Full value, works for all possible operations, triggers both memtrie
/// and disk changes generation.
Full(Vec<u8>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid double computation indeed! It can be significant (1us per hash)

@Longarithm Longarithm changed the title fix: same interface for trie insertion fix: generic trie value updates Oct 30, 2024
@Longarithm
Copy link
Member Author

@robin-near @Trisfald I'm sorry but I need additional round of review here, I realised something which changed my goal.

I thought I didn't introduce additional hash computation, but I actually did. After looking at the whole flow, the easiest way to go forward seemed to be "do the right things now".

Now, I additionally introduce generic_store_value and generic_delete_value. These are different ways for MemTrie and TrieStorage to process value updates. TLDR: memtries can work with inlined values; trie storages must always have full values. Also the way to record accessed nodes is different.

As a nice consequence, for memtries hash computation is postponed after all inserts/deletes are processed.

I think the confusion about hashes is partially caused by the fact that generally for memtries they are already known in FlatStateValue, so it looks easy just to take hash as a field. But in insert we have only the full value at first, and conversion to FlatStateValue for majority of values triggers additional hash computations.

Commit based on the previous version: a273d14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants