-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement new trie_walker which can iterate full+partial tries #1567
feat: implement new trie_walker which can iterate full+partial tries #1567
Conversation
e7b1951
to
e3759ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major concerns, but several non-trivial things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of partially implementing eth_trie::DB
trait, how about we define out own trait. Something like this:
trait TrieWalkerDb {
fn get(&self, key: &[u8]) -> anyhow::Result<Option<Bytes>>;
}
Then we can implement it for whatever structure we want
impl TrieWalkerDb for HashMap<B256, Bytes> {
...
}
impl TrieWalkerDb for AccountDB {
...
}
impl TrieWalkerDb for TrieRocksDB {
...
}
And then we would have something like this in the other file:
pub struct TrieWalker<DB: TrieWalkerDb> {
is_partial_trie: bool,
db: Arc<DB>,
stack: Vec<TrieProof>,
}
What do you think?
If you decide to go this path, ignore the rest of the comments in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can even add some custom logic if needed. Not sure how useful it is, but we can do something like this:
trait TrieWalkerDb {
fn get(&self, key: &[u8]) -> anyhow::Result<Option<Bytes>>;
fn get_node(&self, key: &[u8]) -> anyhow::Result<Option<Node>> {
self.get(key)?
.map(|bytes| decode_node(&mut bytes.as_ref()).map_err(|err| anyhow!(err)))
.transpose()
}
}
|
||
use crate::types::trie_proof::TrieProof; | ||
|
||
/// This is used for walking the whole state trie and partial tries (forward state diffs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Usually, first API documentation line is separate by blank line from the rest of the documentation.
Also, I think other lines should start with capital letter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think I did what you asked
/// 3. getting stats about the state trie | ||
pub struct TrieWalker<TrieDB: DB> { | ||
is_partial_trie: bool, | ||
trie: Arc<TrieDB>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need Arc<..>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes becuase trie.lock().db.clone()
gives us an Arc
Node::Branch(branch) => { | ||
let branch = branch.read().expect("Branch node must be readable"); | ||
|
||
// We don't need to check the branches value as it is already encoded in the proof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this comment is not needed. We are iterating Trie nodes, not Trie values.
))))); | ||
{ | ||
let mut trie = trie.lock(); | ||
trie.insert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in a loop?
async fn test_state_walker() { | ||
let temp_directory = create_temp_test_dir().unwrap(); | ||
let db = Arc::new(setup_rocksdb(temp_directory.path()).unwrap()); | ||
let trie = Arc::new(Mutex::new(EthTrie::new(Arc::new(TrieRocksDB::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this to be Arc<Mutex<...>>
?
} | ||
|
||
let root_hash = trie.lock().root_hash().unwrap(); | ||
let walker = TrieWalker::new(root_hash, trie.lock().db.clone()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't trie.lock().db.clone()
just db
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an Arc of the traited object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is Arc<DB>
}; | ||
leaf_count += 1; | ||
|
||
// reconstruct the address hash from the path so that we can fetch the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this comment. What are we fetching from the database?
Also, what do you think about moving this to |
I don't see value into moving this into
So I don't think it makes sense to store it in I also plan to extend this implementation with logic specific to horizontal scaling; I think moving my trie-walker implementation to |
@morph-dev ready for another review |
trin-execution/src/trie_walker/db.rs
Outdated
|
||
impl TrieWalkerDb for HashMap<B256, Vec<u8>> { | ||
fn get(&self, key: &[u8]) -> anyhow::Result<Option<Bytes>> { | ||
Ok(self.get(key).cloned().map(|vec| vec.into())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can probably be:
Ok(self.get(key).map(Bytes::copy_from_slice))
@@ -7,7 +7,7 @@ use rocksdb::DB as RocksDB; | |||
pub struct TrieRocksDB { | |||
// If "light" is true, the data is deleted from the database at the time of submission. | |||
light: bool, | |||
storage: Arc<RocksDB>, | |||
pub storage: Arc<RocksDB>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this to be public. You just need to use eth_trie::DB
in other place.
trin-execution/src/trie_walker/db.rs
Outdated
|
||
impl TrieWalkerDb for TrieRocksDB { | ||
fn get(&self, key: &[u8]) -> anyhow::Result<Option<Bytes>> { | ||
self.storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add use eth_trie::DB
, you should be able to remove pub
from storage field and do something like this:
self.get(key)
.map(|result| result.map(Bytes::from))
.map_err(...)
use alloy::primitives::{Bytes, B256}; | ||
use anyhow::{anyhow, Ok}; | ||
use db::TrieWalkerDb; | ||
use eth_trie::{decode_node, node::Node}; | ||
|
||
use crate::types::trie_proof::TrieProof; | ||
|
||
/// This is used for walking the whole state trie and partial tries (forward state diffs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I was thinking more like this:
/// Iterates over trie nodes from the whole or partial state trie
///
/// Use cases are:
/// 1. Gossiping the whole state trie
/// 2. Gossiping the forward state diffs (partial state trie)
/// 3. Getting stats about the state trie
.into(); | ||
impl<DB: TrieWalkerDb> TrieWalker<DB> { | ||
pub fn new(root_hash: B256, trie: Arc<DB>) -> anyhow::Result<Self> { | ||
let root_value = match trie.get(root_hash.as_slice())? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename root_value
to root_node_trie
?
|
||
self.stack.push(TrieProof { | ||
path, | ||
proof: [partial_proof, vec![value]].concat(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no reason to use concat()
, as I believe that it would iterate and clone partial_proof (it should be cheap, but still not needed). We can:
let mut proof = partial_proof
proof.push(value);
self.stack.push(TrieProof { path, proof });
or if you prefer:
partial_proof.push(value);
self.stack.push(TrieProof {
path,
proof: partial_proof,
});
} | ||
Ok(()) | ||
} | ||
} | ||
|
||
impl<TrieDB: DB> Iterator for TrieWalker<TrieDB> { | ||
/// Panics if the trie is corrupted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should probably be on TrieWalker
, as I think this will never be shown in IDE or elsewhere.
@@ -170,126 +161,37 @@ impl<TrieDB: DB> Iterator for TrieWalker<TrieDB> { | |||
#[cfg(test)] | |||
mod tests { | |||
use super::*; | |||
|
|||
use eth_trie::{EthTrie, RootWithTrieDiff, Trie}; | |||
use revm_primitives::{keccak256, Address, B256, U256}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alloy::primitives
}; | ||
leaf_count += 1; | ||
|
||
// reconstruct the address hash from the path so that we can fetch the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this comment is still not clear to me. what are we fetching from the database?
aren't we fetching proof directly from the trie?
continue; | ||
}; | ||
|
||
// reconstruct the address hash from the path so that we can fetch the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
Wouldn't it make sense to be in I thought that |
I plan to use it in
Sure, I think a different design would work better for Glados's use case |
What was wrong?
The old
TrieWalker
implementation I did only worked for trie-diffs as it did everything in memory. This is bad because we want to be able to gossip the whole state trie onto the network.My new
trie_walker
implementation will also allow us to reuse code for horizontal scaling (i.e. walking the trie with ranged scopes). So either way having 1 trie implementation which is more flexible should be easier to maintain, keeping both is redundent.How was it fixed?
Implement a trie-walker which builds iterative proofs sequentially, instead of building a map of back traces from every node like the old trie-walker did.
Using this new trie-walker I was able to iterate whole the whole state at block 21 million while only using 1.4GB of ram which is really good as the whole state is almost 400GB
I did some testing and both implementations generate the same amount of content I tested it from 0-100k