-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile the performance of in memory trie for shard 2 #10877
Comments
Performed an initial set of measurements in apply_chunk latency. See the related dashboard here. Observed 3x-10x reduction (will repeat with longer time and with more shared later). I also observed long load times for memtrie:
Will look at what is going on there. |
cc @robin-near |
Measured different phases of constructing the memory trie. Looks like the most of the time spent is coming from reading the flat state values from the database (not from the constructing of the trie in memory). |
@tayfunelmas , does long trie loading(from flat storage) and construction time impact performance? I was assuming this should be done before the next epoch behind the scene? |
It is called in two cases:
Besides, I found out that the latency is not only coming from the iterating over the flat-state over rocksDB. (example profiling view) It is also coming from the construction and hash computation of the memtrie; we encode/decode nodes (serialize/deserialize) between constructing the trie and computing hashes. |
To make sure I understand the 'latency' here correctly, the latency here is about when a validator can participate in consensus mechanism, right? This latency shouldn't have anything to do with 'how long it will take for a validate to perform chunk generation/validation'. |
Yes, once the memtrie is loaded, this latency will not contribute to the later operations such as block/chunk production or validation. In fact this load code is specific to one-off loading of the state and separate from the rest of the memtrie operations performed during block/chunk generation or validation. |
We can move the memtrie loading part to a separate thread. I have a draft implementation. Will do it next week. |
Do you want to move the entire load operation to a separate thread? How does it help? Assuming we are talking about the node startup, the rest of the functionality needs to wait for memtrie load anyways. I think we can parallelize certain parts of the load instead, for example, the hash computation can start earlier in parallel while the trie is being built (it is currently done after the tree is fully constructed). But not sure about having the entire loading in a different thread, I might be missing something. |
I was thinking about catchup. Yes, for startup it might be hard to do it. For startup I thought that maybe we could start with regular trie, and load memtrie in background, but that's probably not possible as we might want the state to not change during the load. |
Previously, we concluded that with optimizations in runtime, the bottleneck of
apply
is mostly storage operations, which in memory trie should help a lot with. It would be good to understand how much we gain by enabling in memory trie for a shard and what the remaining performance bottlenecks are.The text was updated successfully, but these errors were encountered: