-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Rework storage iterators #13284
Rework storage iterators #13284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
prefix: Option<StorageKey>, | ||
current_key: Vec<u8>, | ||
_phantom: PhantomData<Block>, | ||
skip_if_first: Option<StorageKey>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the iterators on the storage backend and on Client
work differently regarding the start_at
.
For the backend if a start_at
is specified then it will seek to that key and it will always include that key in the iterator's output. For the Client
if the key specified in start_at
matches exactly the first key it encounters then it skips that key.
This inconsistency is a little confusing, but that's how it works.
I've added some extra tests that verify this behavior, and I've also checked that they pass before my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a point to be extremely careful with, retaining the sometime ~ behavior of iteration calls from the existing host functions (all that is touched by clear_prefix
from sp_externalities).
I remember something ~ with the way error are handled, but I may be confusing with something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll probably run a full burn-in before merging this just to make sure nothing's broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
fn was_complete(&self) -> bool { | ||
matches!(self.state, IterState::FinishedComplete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about FinishedIncomplete
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it finished incomplete
it didn't finish completely :D
}; | ||
if let Err(e) = result { | ||
debug!(target: "trie", "Error while iterating by prefix: {}", e); | ||
if self.root == Default::default() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always a bit awkward to tell if self.root is vec![0; 32] or hash(trie_empty_node) (eg MemoryDb::null_node_data), actually it looks like self.empty could be use instead (that is the root of a empty trie).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... why do we have self.empty
anyway? In practice won't H::default
always point to an empty storage root and a value which doesn't exist? (And idiomatically T::default()
is essentially always used to denote an empty something, e.g. Vec::default
, HashMap::default
, etc. are all empty, so it's kinda weird to have two things which mean "empty".)
But anyhow, as far as I can see changing this to self.root == self.empty
will fail at least one test (pairs_are_empty_on_empty_storage
) because the root
there is initialized with Default::default
, so the fake empty key self.empty
won't match, so it will go and call TrieDBRawIterator::new_prefixed
, and that will fail with an Invalid state root
error. (This worked previously without special-casing it like this because pairs
just ignored the errors, and this new iterator doesn't do that anymore.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H::default is [0; 32], that's an invalid trie root. Could have given the actual empty trie value but that's not convenient at all (makes the H depends on trie internals). In practice it 's just one step from saying any invalid root is an empty trie, but that would be confusing (like if you got a root and no content you cannot say if it is empty).
But yes that's rather awkward that some place allows H::default.
}); | ||
|
||
if let Err(error) = result { | ||
log::debug!(target: "trie", "Error while iterating the storage: {}", error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be good to add a comment or insist on the fact that ignoring error is a host function related behavior. Don't really know how to word it.
@@ -213,23 +213,29 @@ where | |||
.map_err(client_err) | |||
} | |||
|
|||
// TODO: This is horribly broken; either remove it, or make it streaming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the paged variant is supposed to replace this rpc, so was planed to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I know. But technically we don't necessarily need to remove those non-paged variants. We could just fix them and make them stream the data. (It's out of scope of this PR though.)
type Backend = RecordStatsState<S, B>; | ||
type Error = S::Error; | ||
|
||
fn next_key(&mut self, backend: &Self::Backend) -> Option<Result<StorageKey, Self::Error>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but could be good to record state (at least adding it looks straight forward).
Sure thing. |
Thanks a lot 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to approve btw :) (IIRC only the question of he empty root was a bit bugging me but it is mainly me being finicky).
@cheme Done. Do you want to get this merged as soon as possible, or can it wait a little while? As I've said, I wanted to run a full burn-in before merging this just to be on a safe side, but since running those takes time and is a pain I've planned to get my other storage iterator related changes finished too and run them in the same burn-in (even though I'm not going to push them to this PR) and only merge this once that passes. But if you want to get this merged sooner I can just start a burn-in only with these changes present. |
@tomaka probably want it as soon as possible, but alternatively I can publish 0.24.1 with the fix. |
Okay; then let me start a burn in with only the changes from this PR then. (Not sure if it matters but as a side effect this will also test your changes.) |
My burnin finished successfully, so this should be good to go I think. |
The CI pipeline was cancelled due to failure one of the required jobs. |
bot merge |
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
This PR reworks the way we iterate over storage.
Summary of the changes:
Vec
. There's no real good reason to have those (except maybe in tests), and unless extremely careful using them just results in things exploding.sp_state_machine::Backend
trait now has methods which return proper iterators. All of the iteration methods in there (apply_to_key_values_while
,for_keys_with_prefix
,pairs
,keys
,child_pairs
, etc.) are now default implemented on the trait itself, and are channeled through theraw_iter
method. This both is simpler to use (you can just use an iterator now, instead of having to use one of the methods which take a callback) and simpler to implement (you don't have to reimplement the whole zoo of methods which all do the same thing - iterate over the storage - but in subtly different ways).This requires a new version of
trie-db
(paritytech/trie#181).Potential future work
for_keys_with_prefix
, etc. Since the trait now exposes proper iterators those are now completely unnecessary. I would have done this in this PR, but the diff's already ~1.5k lines long, so I didn't want to bloat it up any further. I'll do it in the next PR once this one is merged.polkadot companion: paritytech/polkadot#6653