Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add storage cache for child trie and notification internals #2639

Merged
merged 27 commits into from
Jun 14, 2019

Conversation

cheme
Copy link
Contributor

@cheme cheme commented May 21, 2019

This PR add child trie (CT) key value storage to the cache (see changes on into_committed method).

  • Notification part

From this point this PR also plug child trie values in the notification mechanism:

  • CT storage changes use the same subscription as top trie
  • wildcard on top does not mean wildcard on CT
  • RPC remain unchanged at this point (non breaking PR, another one will be needed to change storage subscription)

@tomusdrw @jacogr and all, those choice may not be really adequate (if we want a separate api for subscribing to ct value, part of this pr is wrong), should we consider the two first points ok?

  • storage cache part

quite straight forward but keeps pointing to the fact that CT api may be merge into standard api : lot of redundancy as in all CT code.
One bad point of the pr is the use of (Option<Vec<u8>>, Vec<u8>) as a key of lru cache (cannot use two cache as usual since a single lru seems better).
This leads to very awkward vec instantation on query (no Borrow implementation for tuple).
I think, but this is dependant on wether we want to unify ct api into parent api, that a StoragePath struct with encoded offset could do the job (depending on property of ord offset encoding position would change).
StoragePath would replace the couple child storage key and child key by an expanded path with position of the child switch in the path. This seems like an adequate representation (ct are quite similar to a standard trie branch but with a hop).

@cheme cheme added the Z7-question Issue is a question. Closer should answer. label May 21, 2019
fn update_storage(&mut self, update: Vec<(Vec<u8>, Option<Vec<u8>>)>) -> Result<(), client::error::Error> {
fn update_storage(
&mut self,
update: Vec<(Vec<u8>,Option<Vec<u8>>)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth introduing some typedefs for this to improve readability?
Now the type(s) seem pretty complicated (a lot of Vec and u8 mixed together :))

@@ -27,7 +27,7 @@ use log::trace;

const STATE_CACHE_BLOCKS: usize = 12;

type StorageKey = Vec<u8>;
type StorageKey = (Option<Vec<u8>>, Vec<u8>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth introducing a proper struct for this? We could have StorageKey::root(key) or StorageKey::child(child_key, key) for instantiation

let childs = child_changes
.into_iter()
.map(|(k,i)|(Some(k),i))
.chain(::std::iter::once((None, changes)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:: shouldn't be required in edition=2018

Suggested change
.chain(::std::iter::once((None, changes)));
.chain(std::iter::once((None, changes)));

childs.for_each(|(sk, changes)|
for (k, v) in changes.into_iter() {
let k = (sk.clone(), k);
if is_best {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid else clause and duplication without additional clones like this:

if is_best {
  cache.hashes.remove(&k);
  CachingState::<H, S, B>::storage_insert(cache, k.clone(), v);
}
modifications.insert(k);

.chain(::std::iter::once((None, changes)));
childs.for_each(|(sk, changes)|
for (k, v) in changes.into_iter() {
let k = (sk.clone(), k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to prevent sk.clone() here? Could we use Cow for the child_key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need the tuple to derive borrow, which is probably not correct: relates to the idea of having a proper StorageKey type (and make this type such as it can implement borrow without instantiation (keeping an internal Vec with offset seems like the easiest way to me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my previous comment does probably not make much sense, HStorageKey is interesting but the problematic here is not that complicated.
The only simple non-break-all-api way of solving it I see is to put child in their own map storage. As said before it breaks lru expectation.
@tomusdrw I did notice that there is already two lru map, is it intended or is a single lru for hashes and decoded value a better choice?
So the way to fix this could be to use directly the internal lru storage (some doublelinkedhashmap) and manage lru limit size globally for CachingState (and of course move child cache to their own two level map). This lru size management does not looks to complicated and this will allow to have a single lru size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with how caches are implemented, however it seems that currently child storage is used mostly for contracts, this means that having two separate lru's might make sense, cause we are switching between runtime context (that has it's own top-level cache) and contract context (that has it's own child cache). So after finishing the contract context we will still have all necessary top level items in the cache, no matter what contract did. I'm not sure if that's something we are going to keep though, if we do it might require a different solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the case of contract talking to each others, but I think you spot it: having separate mgmt could make sense (top level could be seen as more useful and child contract lru could use another size), plus it is more direct to implement 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am not awake 🤦‍♂️ , there is already the kind of merged lru logic that I describe (we just got some overhead by using lru instead of its inner linkedhashmap struct) so I just need to keep using it that way

pub fn new_shared_cache<B: Block, H: Hasher>(shared_cache_size: usize) -> SharedCache<B, H> {
and
if let Some(v_) = &v {

self.next_id += 1;
let next_id = self.next_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather call it current_id

.entry(c_key.clone())
.or_insert_with(Default::default);

(c_key.clone(), if let Some(keys) = o_keys {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code feels complicated here, I'd try to extract it similar as in notify, so that:

add_listeners(filter_keys, &mut self.wildcard_listeners, &mut self.listeners);

can be re-used for top-level keys and child keys

core/client/src/notifications.rs Outdated Show resolved Hide resolved
core/client/src/notifications.rs Outdated Show resolved Hide resolved
let child_filters = Some([
(StorageKey(vec![4]), None),
(StorageKey(vec![5]), None),
].into_iter().cloned().collect());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why into_iter().cloned().collect() can't you just do vec![...] directly or is it a HashSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes HashMap

@cheme
Copy link
Contributor Author

cheme commented May 22, 2019

I've been trying a bit to use a single key for child (see HStorageKey of latest commit), but it seems to be counterproductive: this is quite orthogonal to #2209, and trying to generalize its usage is pushing thing way to far.
I think I will drop this approach and create a pr as @tomusdrw suggested.
Maybe if #2209 change to a full key path approach HStorageKey could be resurected (or for higher level operation where cycle need to be forbid: but the Ord property might not be interesting and a standard encoding may be more straight forward).
Edit: removed, HStorage ordering preservation could maybe make sense for #2622

@cheme
Copy link
Contributor Author

cheme commented May 24, 2019

In latest commit:

  • fix the lru caches to use memory estimation for all lru (removing very small overhead of lrucache).
  • change the lru caches to relate to configuration memory limit
  • change rpc calls to use the hashes cache
  • apply ratio over the different lru cache

There is still a problem with those lru cache: that is 4 lru list sharing a size limit, so the removal strategy being local (last use of the lru list where we insert content) a
lru can deplete another one and it is basically first using takes memory.
I am thinking on using 4 counter with a ratio of given memory (child ratio and hash ratio need to be defined and combined (hash first applied, child on result)).
This change would also allow a cleaner lru struct than the current tools method.
Any opinion on default value for them: I would go hash 10% and child 2% (assuming there is no child use and if there is you use it you really want to change that value to 90% or such) ?

@cheme cheme added A0-please_review Pull request needs code review. and removed Z7-question Issue is a question. Closer should answer. labels May 24, 2019
@gavofyork
Copy link
Member

@cheme who can review this effectively? @arkpar ?

@cheme
Copy link
Contributor Author

cheme commented Jun 4, 2019

The PR just copy what is done on general storage for child storage, but there is a bit of a gray area around the actual use case of the shared cache (difference with local cache).
So it would be best if the original code writer can take a peek, cc @arkpar (I would also welcome better design idea than the ratio I put in place).

@arkpar
Copy link
Member

arkpar commented Jun 7, 2019

Hash queries are only used for CODE and maybe for a couple of other keys at the moment. So the cache for hashes does not grow really. I'd put a fixed limit for hashes for now. 64 kilobytes maybe. As for the main/child ratio let's make it 50/50 but add an additional configuration option that enables/disables child tree cache.

@arkpar
Copy link
Member

arkpar commented Jun 7, 2019

Also, I would not bother taking any overhead into size calculations. Such as internal overhead for linked hash map items or vector capacity. These highly depend on the internals of external libraries and the allocator, that might change with a new version. We'd just say that that the configured size is the data size and not the consumed memory size, which might be 20% or so higher on average.

cheme added 2 commits June 7, 2019 15:46
Removal of child-trie-hash lru.
Fix lru storage value for hashes lru.
@cheme
Copy link
Contributor Author

cheme commented Jun 7, 2019

So FWIU, hashes lru use is pretty limited, so I switch to proposed fix length value of 64ko and remove the hashes lru ratio configuration.

Similarily lru_child_hashes probably does not make much sense, so I remove it too.

Also removed the fix per element overhead, I wanted it to account for the use case of spammed small key value, but this usecase is partially limited by the key length that need to grow to grow in value.
If I want to put it back, it probably make more sense to apply it directly when calling accessing the lru size and simply add number of element * fix cost.

+ self.lru_child_storage.used_size()
+ self.lru_child_hashes.used_size()
// ignero small hashes storage + self.lru_hashes.used_size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@arkpar arkpar added A7-looksgoodcantmerge and removed A0-please_review Pull request needs code review. labels Jun 14, 2019
@gavofyork gavofyork merged commit 879e4a8 into paritytech:master Jun 14, 2019
folsen pushed a commit that referenced this pull request Jun 18, 2019
* child cache, and test failing notifications

* fix tests and no listen child on top wildcard

* remove useless method

* bump impl version

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* factoring notification methods to remove some redundant code.

* test child sub removal

* HStorage implementation and some type alias.

* Remove HStorage cache: does not fit

* fix removal

* Make cache use byte length (shared) instead of number of kv

* Make use of hashes cache in rpc

* applying ratio on different lru caches

* Fix format

* break a line

* Remove per element overhead of lru cache.

* typo
MTDK1 pushed a commit to bdevux/substrate that referenced this pull request Jul 10, 2019
…ch#2639)

* child cache, and test failing notifications

* fix tests and no listen child on top wildcard

* remove useless method

* bump impl version

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* Update core/client/src/notifications.rs

Co-Authored-By: Tomasz Drwięga <[email protected]>

* factoring notification methods to remove some redundant code.

* test child sub removal

* HStorage implementation and some type alias.

* Remove HStorage cache: does not fit

* fix removal

* Make cache use byte length (shared) instead of number of kv

* Make use of hashes cache in rpc

* applying ratio on different lru caches

* Fix format

* break a line

* Remove per element overhead of lru cache.

* typo
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants