-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add storage cache for child trie and notification internals #2639
Conversation
core/client/db/src/lib.rs
Outdated
fn update_storage(&mut self, update: Vec<(Vec<u8>, Option<Vec<u8>>)>) -> Result<(), client::error::Error> { | ||
fn update_storage( | ||
&mut self, | ||
update: Vec<(Vec<u8>,Option<Vec<u8>>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth introduing some typedefs for this to improve readability?
Now the type(s) seem pretty complicated (a lot of Vec
and u8
mixed together :))
core/client/db/src/storage_cache.rs
Outdated
@@ -27,7 +27,7 @@ use log::trace; | |||
|
|||
const STATE_CACHE_BLOCKS: usize = 12; | |||
|
|||
type StorageKey = Vec<u8>; | |||
type StorageKey = (Option<Vec<u8>>, Vec<u8>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth introducing a proper struct for this? We could have StorageKey::root(key)
or StorageKey::child(child_key, key)
for instantiation
core/client/db/src/storage_cache.rs
Outdated
let childs = child_changes | ||
.into_iter() | ||
.map(|(k,i)|(Some(k),i)) | ||
.chain(::std::iter::once((None, changes))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::
shouldn't be required in edition=2018
.chain(::std::iter::once((None, changes))); | |
.chain(std::iter::once((None, changes))); |
childs.for_each(|(sk, changes)| | ||
for (k, v) in changes.into_iter() { | ||
let k = (sk.clone(), k); | ||
if is_best { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid else
clause and duplication without additional clones like this:
if is_best {
cache.hashes.remove(&k);
CachingState::<H, S, B>::storage_insert(cache, k.clone(), v);
}
modifications.insert(k);
.chain(::std::iter::once((None, changes))); | ||
childs.for_each(|(sk, changes)| | ||
for (k, v) in changes.into_iter() { | ||
let k = (sk.clone(), k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to prevent sk.clone()
here? Could we use Cow
for the child_key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need the tuple to derive borrow, which is probably not correct: relates to the idea of having a proper StorageKey type (and make this type such as it can implement borrow without instantiation (keeping an internal Vec with offset seems like the easiest way to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, my previous comment does probably not make much sense, HStorageKey is interesting but the problematic here is not that complicated.
The only simple non-break-all-api way of solving it I see is to put child in their own map storage. As said before it breaks lru expectation.
@tomusdrw I did notice that there is already two lru map, is it intended or is a single lru for hashes and decoded value a better choice?
So the way to fix this could be to use directly the internal lru storage (some doublelinkedhashmap
) and manage lru limit size globally for CachingState
(and of course move child cache to their own two level map). This lru size management does not looks to complicated and this will allow to have a single lru size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with how caches are implemented, however it seems that currently child storage is used mostly for contracts, this means that having two separate lru's might make sense, cause we are switching between runtime context (that has it's own top-level cache) and contract context (that has it's own child cache). So after finishing the contract context we will still have all necessary top level items in the cache, no matter what contract did. I'm not sure if that's something we are going to keep though, if we do it might require a different solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the case of contract talking to each others, but I think you spot it: having separate mgmt could make sense (top level could be seen as more useful and child contract lru could use another size), plus it is more direct to implement 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am not awake 🤦♂️ , there is already the kind of merged lru logic that I describe (we just got some overhead by using lru instead of its inner linkedhashmap struct) so I just need to keep using it that way
pub fn new_shared_cache<B: Block, H: Hasher>(shared_cache_size: usize) -> SharedCache<B, H> { |
substrate/core/client/db/src/storage_cache.rs
Line 136 in 69dd3b1
if let Some(v_) = &v { |
core/client/src/notifications.rs
Outdated
self.next_id += 1; | ||
let next_id = self.next_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather call it current_id
core/client/src/notifications.rs
Outdated
.entry(c_key.clone()) | ||
.or_insert_with(Default::default); | ||
|
||
(c_key.clone(), if let Some(keys) = o_keys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code feels complicated here, I'd try to extract it similar as in notify
, so that:
add_listeners(filter_keys, &mut self.wildcard_listeners, &mut self.listeners);
can be re-used for top-level keys and child keys
let child_filters = Some([ | ||
(StorageKey(vec![4]), None), | ||
(StorageKey(vec![5]), None), | ||
].into_iter().cloned().collect()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why into_iter().cloned().collect()
can't you just do vec![...]
directly or is it a HashSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes HashMap
Co-Authored-By: Tomasz Drwięga <[email protected]>
Co-Authored-By: Tomasz Drwięga <[email protected]>
Co-Authored-By: Tomasz Drwięga <[email protected]>
Co-Authored-By: Tomasz Drwięga <[email protected]>
I've been trying a bit to use a single key for child (see HStorageKey of latest commit), but it seems to be counterproductive: this is quite orthogonal to #2209, and trying to generalize its usage is pushing thing way to far. |
In latest commit:
There is still a problem with those lru cache: that is 4 lru list sharing a size limit, so the removal strategy being local (last use of the lru list where we insert content) a |
The PR just copy what is done on general storage for child storage, but there is a bit of a gray area around the actual use case of the shared cache (difference with local cache). |
Hash queries are only used for CODE and maybe for a couple of other keys at the moment. So the cache for hashes does not grow really. I'd put a fixed limit for hashes for now. 64 kilobytes maybe. As for the main/child ratio let's make it 50/50 but add an additional configuration option that enables/disables child tree cache. |
Also, I would not bother taking any overhead into size calculations. Such as internal overhead for linked hash map items or vector capacity. These highly depend on the internals of external libraries and the allocator, that might change with a new version. We'd just say that that the configured size is the data size and not the consumed memory size, which might be 20% or so higher on average. |
Removal of child-trie-hash lru. Fix lru storage value for hashes lru.
So FWIU, hashes lru use is pretty limited, so I switch to proposed fix length value of 64ko and remove the hashes lru ratio configuration. Similarily lru_child_hashes probably does not make much sense, so I remove it too. Also removed the fix per element overhead, I wanted it to account for the use case of spammed small key value, but this usecase is partially limited by the key length that need to grow to grow in value. |
core/client/db/src/storage_cache.rs
Outdated
+ self.lru_child_storage.used_size() | ||
+ self.lru_child_hashes.used_size() | ||
// ignero small hashes storage + self.lru_hashes.used_size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
* child cache, and test failing notifications * fix tests and no listen child on top wildcard * remove useless method * bump impl version * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * factoring notification methods to remove some redundant code. * test child sub removal * HStorage implementation and some type alias. * Remove HStorage cache: does not fit * fix removal * Make cache use byte length (shared) instead of number of kv * Make use of hashes cache in rpc * applying ratio on different lru caches * Fix format * break a line * Remove per element overhead of lru cache. * typo
…ch#2639) * child cache, and test failing notifications * fix tests and no listen child on top wildcard * remove useless method * bump impl version * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * Update core/client/src/notifications.rs Co-Authored-By: Tomasz Drwięga <[email protected]> * factoring notification methods to remove some redundant code. * test child sub removal * HStorage implementation and some type alias. * Remove HStorage cache: does not fit * fix removal * Make cache use byte length (shared) instead of number of kv * Make use of hashes cache in rpc * applying ratio on different lru caches * Fix format * break a line * Remove per element overhead of lru cache. * typo
This PR add child trie (CT) key value storage to the cache (see changes on
into_committed
method).From this point this PR also plug child trie values in the notification mechanism:
@tomusdrw @jacogr and all, those choice may not be really adequate (if we want a separate api for subscribing to ct value, part of this pr is wrong), should we consider the two first points ok?
quite straight forward but keeps pointing to the fact that CT api may be merge into standard api : lot of redundancy as in all CT code.
One bad point of the pr is the use of
(Option<Vec<u8>>, Vec<u8>)
as a key of lru cache (cannot use two cache as usual since a single lru seems better).This leads to very awkward vec instantation on query (no Borrow implementation for tuple).
I think, but this is dependant on wether we want to unify ct api into parent api, that a
StoragePath
struct with encoded offset could do the job (depending on property of ord offset encoding position would change).StoragePath
would replace the couple child storage key and child key by an expanded path with position of the child switch in the path. This seems like an adequate representation (ct are quite similar to a standard trie branch but with a hop).