Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement .era2 (state snapshots) import/export from Trin Execution #1344

Merged
merged 4 commits into from
Sep 14, 2024

Conversation

KolbyML
Copy link
Member

@KolbyML KolbyML commented Jul 22, 2024

What was wrong?

we have .era2 types in e2store, but we don't support importing or exporting these files using trin-execution

How was it fixed?

By implementing a minimal trie walking implementing which provides an api where you can loop over the leafs of a trie.

Then we implement

  • exporting to era2
  • importing from era2

@morph-dev
Copy link
Collaborator

I know this is still in progress and maybe you already had this in mind, but this pr is huge! Please split it up in smaller parts before sending for review.

I assume some work is already done, even if not final, and can already be reviewed. For example new structure/type definitions and helping structures/functions around them (without introducing much new logic with the existing code, if not needed).

@KolbyML
Copy link
Member Author

KolbyML commented Aug 4, 2024

I know this is still in progress and maybe you already had this in mind, but this pr is huge! Please split it up in smaller parts before sending for review.

I assume some work is already done, even if not final, and can already be reviewed. For example new structure/type definitions and helping structures/functions around them (without introducing much new logic with the existing code, if not needed).

It is hard to take this comment seriously #1344 (comment) when if it should be said at all, it should be said on this PR #1337

Realistically this work can be broken up into 3 PR's, and it is already in 2 #1344 and #1337, the 3rd one would be a PR for fixing sender recovery

I am not going to open a PR for review until I make a dump of state at block 15 million

  • I have fixed a few bugs well executing the code
  • It doesn't make sense to review any of this code yet, for one it is currently being tested, 2 it risks what I am executing.

I am sure not everyone was aware this PR was already split up into 2 PRs, these PR's are drafts for a reason, they are not ready to be reviewed.

These PR's will be ready for review in a week after that is done

  • add executing from era support
  • potentially change to something like MXDB as a database.

@morph-dev
Copy link
Collaborator

My comment wasn't specifically tied to this PR, and definitely not to this PR at the current stage (still in progress), but in general.
It was a recommendation / suggestion to try to split big PRs into smaller once before sending for review.

I apologize if you were planning on doing this anyway and my feedback was coming on not desired time (not being send for review), but I wanted to provide early feedback and avoid delays later on (I find it easier to split work while I'm working on it, not when everything is done, but that's just my personal workflow).
Also, if certain work is more or less complete (e.g. defining era2 types), then it can we extracted into separate PR and be reviewed already (if you want with a note that it's not a final version), and reduce time for future reviews. But it's up to you to estimate how complete each part is and whether they can be extracted.

I commented on this PR because you updated it recently. And I acknowledge that you already split it up before, but I believe that they should be split into even smaller PRs. And again, I apologize if this is something that you were planning on doing anyway

@KolbyML
Copy link
Member Author

KolbyML commented Aug 4, 2024

Also, if certain work is more or less complete (e.g. defining era2 types), then it can we extracted into separate PR and be reviewed already (if you want with a note that it's not a final version),

I don't think this is logical to try to merge .era2 without import or export, how do we know if .era2 is even useful without that. Making a PR for unused types is redundant, I don't find value in that. I think there is a lot of value in being able to export an .era2 file and import it from the PR adding .era2.

reduce time for future reviews. But it's up to you to estimate how complete each part is and whether they can be extracted.

The main delay for me "opening these PR's for review", is I am currently executing blocks using this code. I am not really concerned with reduce time for future reviews. I am concerned with not accidentally running a change suggested in a PR which corrupts my state database for xyz reason. It is easier and less risky to get the state snapshot then open the PR's for review.

I am not really able to test .era2 until I am done executing, I have brought this up in meetings.

@pipermerriam
Copy link
Member

pipermerriam commented Aug 4, 2024

I feel that @morph-dev has a reasonable point here that should be highlighted.

There is an often cited study that showed that code review effectiveness drops off significantly after 400 lines of code and is most effective when kept under 200 lines of code.

These shouldn't be strict rules of course.

Which is probably why the suggestion was made to extract any pieces of this (like the types) into a standalone PR. You are correct that the types aren't much use on their own. However, the act of splitting them into their own PR has value on it's own because that PR is easy to review AND it reduces the size of the final PR which means that the other members of the team will both have an easier time reviewing this code and the review they give will be higher quality.

@KolbyML
Copy link
Member Author

KolbyML commented Aug 4, 2024

@morph-dev
Ok I made these PR's which are ready for review based off the 2 I already have

I am fine to split .era2 types and import/export into 2 different PR's, but I can't do that in good faith till it is battle tested around 4-5 days from now.

The 3 PR's I made above don't make any changes to the database code so they are fine to be reviewed right now

@KolbyML KolbyML force-pushed the era-xx2 branch 9 times, most recently from 67ff3df to 50a31fb Compare August 13, 2024 02:42
@KolbyML KolbyML marked this pull request as ready for review August 13, 2024 02:44
@KolbyML KolbyML requested a review from morph-dev August 13, 2024 02:45
@carver
Copy link
Collaborator

carver commented Aug 27, 2024

I will send this back from "in review" to "in progress" since it's got some conflicts to resolve

@KolbyML KolbyML changed the title feat: implement .era2 (state snapshots) + import/export from Trin Execution feat: implement .era2 (state snapshots) import/export from Trin Execution Aug 29, 2024
@KolbyML KolbyML force-pushed the era-xx2 branch 8 times, most recently from a2366f5 to fdf711e Compare August 30, 2024 04:31
@KolbyML KolbyML force-pushed the era-xx2 branch 2 times, most recently from abd0c35 to 26444cb Compare September 8, 2024 09:20
@KolbyML
Copy link
Member Author

KolbyML commented Sep 8, 2024

@morph-dev this PR is ready for review

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that leaf iteration would not work on storage tries. Did you test this code and does it work? (e.g. on block 1_000_000)?

I also think that you can utilize some functionality from ethportal-api/src/types/state_trie/trie_traversal.rs. It would be even better if we can move this traversal there as well (or at least partially). I have an idea on how it can be done nicely, but I have to try it out to see if it works.

}

#[derive(Args, Debug, Default, Clone, PartialEq)]
pub struct ImportState {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe ImportStateConfig? At least for me, State implies that it actually contains some state.

Same below.

// insert the last 256 block hashes into the database
let first_block_hash_to_add = max(
0,
state_importer.trin_execution.next_block_number() - BLOCKHASH_SERVE_WINDOW,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would panic if state_importer.trin_execution.next_block_number() is less than BLOCKHASH_SERVE_WINDOW

let mut state_importer = StateImporter::new(trin_execution, import_state);
state_importer.import_state()?;

// insert the last 256 block hashes into the database
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that maybe this entire logic (of inserting 256 block headers) should live somewhere in subcommands folder.

trin-execution/src/subcommands/era2.rs Outdated Show resolved Hide resolved
trin-execution/src/subcommands/era2.rs Outdated Show resolved Hide resolved
};

let mut storage: Vec<StorageItem> = vec![];
if account_state.storage_root != keccak256([EMPTY_STRING_CODE]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldn't we use EMPTY_ROOT_HASH constant?

}

// Get the rounded up storage count
let storage_count = storage.len() / MAX_STORAGE_ITEMS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to use div_ceil:

let storage_count = storage.len().div_ceil(MAX_STORAGE_ITEMS);

storage_count: storage_count as u32,
}))?;

for _ in 0..storage_count {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to use Vec::chunks or itertools::chunks

I would prefer standard one, but I'm not sure if it can avoid copying elements. The itertools can.

You drain approach, while it avoids making copies, would still cause a lot of objects in memory moving around (as you are draining from the beginning of a Vec)

)))?;
}

if accounts_processed % 10000 == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe do this printing after you increase the counter.
Or if you do it before the increment because you want to see 0 being printed, do it at the start of the loop (but that's redundant as you already have "Trie leaf iterator initiated" printed at the start).

trin-execution/src/subcommands/era2.rs Outdated Show resolved Hide resolved
@morph-dev
Copy link
Collaborator

After spending quite some time implementing trie traversal myself (pr/1344 branch in my trin fork), I realized that EthTrie already has trie iterator: https://docs.rs/eth_trie/0.4.0/eth_trie/struct.EthTrie.html#method.iter. I think we should use it (but you should check and make sure that it traverses the trie in-order)

@KolbyML
Copy link
Member Author

KolbyML commented Sep 9, 2024

I think that leaf iteration would not work on storage tries. Did you test this code and does it work? (e.g. on block 1_000_000)?

I both exported and imported .era2 files I generated at block 1 million and 15 million so I can factually say this code properly handles storage tries

@morph-dev
Copy link
Collaborator

morph-dev commented Sep 10, 2024

I both exported and imported .era2 files I generated at block 1 million and 15 million so I can factually say this code properly handles storage tries

I believe that it works for your cases. After a bit of thinking, I now think that it's likely going to work on all blocks. But there is a possible case when it might not. And while chances of such case happening are very low, it's theoretically possible.

Now that we are switching to the iteration done by EthTrie, this becomes irrelevant. I still want to document the problem, as it's important concept to understand regarding how MPT works in general.


The issue is the following. If encoding of a node is less than 32 bytes, its parent would inline that node directly (instead of using hash of it).

For account tries, this can't happen because:

  • each leaf is rlp encoding of the account state (which includes storage_root and code_hash, making it at least 64 bytes)
  • all other nodes include at least one child, meaning they are longer than 32 bytes as well

For storage trie, this is different, as values that are stored can be very short (only 1 byte). But since leaf nodes include path from parent node, in majority of cases, encoding of the entire leaf node would be at least 32 bytes (there are some extra bytes used by rlp encoding itself).

But if the tree is deep, this path can be small enough, resulting in leaf node being inlined directly into parent (instead of as hash), and if encoding of that parent is small, it can be included in its own parent, etc.

Your code handles the case when child of a branch node is inlined leaf node (so I assume you encountered this is practice), but it's technically possible for child of a branch or extension node to be inlined another branch or extension node. Chances for this happening are very-very small (because each additional node that you are inlining adds some, potentially quite a bit, extra bytes for encoding), and we might not encounter it in practice. But I would say it's important corner case that should be covered so that we are resilient regardless of the size (depth) of the trie.

@KolbyML
Copy link
Member Author

KolbyML commented Sep 13, 2024

I both exported and imported .era2 files I generated at block 1 million and 15 million so I can factually say this code properly handles storage tries

I believe that it works for your cases. After a bit of thinking, I now think that it's likely going to work on all blocks. But there is a possible case when it might not. And while chances of such case happening are very low, it's theoretically possible.

Now that we are switching to the iteration done by EthTrie, this becomes irrelevant. I still want to document the problem, as it's important concept to understand regarding how MPT works in general.

The issue is the following. If encoding of a node is less than 32 bytes, its parent would inline that node directly (instead of using hash of it).

For account tries, this can't happen because:

  • each leaf is rlp encoding of the account state (which includes storage_root and code_hash, making it at least 64 bytes)
  • all other nodes include at least one child, meaning they are longer than 32 bytes as well

For storage trie, this is different, as values that are stored can be very short (only 1 byte). But since leaf nodes include path from parent node, in majority of cases, encoding of the entire leaf node would be at least 32 bytes (there are some extra bytes used by rlp encoding itself).

But if the tree is deep, this path can be small enough, resulting in leaf node being inlined directly into parent (instead of as hash), and if encoding of that parent is small, it can be included in its own parent, etc.

Your code handles the case when child of a branch node is inlined leaf node (so I assume you encountered this is practice), but it's technically possible for child of a branch or extension node to be inlined another branch or extension node. Chances for this happening are very-very small (because each additional node that you are inlining adds some, potentially quite a bit, extra bytes for encoding), and we might not encounter it in practice. But I would say it's important corner case that should be covered so that we are resilient regardless of the size (depth) of the trie.

The original question was if when I ran my code it worked in the first place which I said it did, I didn't comment on 1 byte data being compacted, because it wasn't asked or seemed probable to our usecase, but as you said Now that we are switching to the iteration done by EthTrie, this becomes irrelevant

@KolbyML
Copy link
Member Author

KolbyML commented Sep 13, 2024

@morph-dev ready for another review, I believe I resolved all the PR's concerns.

I also tested the changes by

  • executing to 1 million
  • exporting it to a .era2 file
  • deleting my data dir
  • importing the .era2 file I just generated
  • executing to 2 million plus blocks

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work :shipit:

@@ -24,6 +24,17 @@ impl Default for Account {
}
}

impl From<AccountStateInfo> for Account {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not related to this PR, but is there any particular reason why we need Account type? Can't we just use AccountState from ethportal_api?

The only additional functionality that it has is from_account_info, which I think we don't want to add to ethportal_api (we don't want to add revm_primitives as deps), but considering that it's used only in one place, we can just inline it (or create utility function).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since they are already in their own module and have non trivial logic, I would extract imported and exported into separate files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like that import and export are part of the main executable. It doesn't make sense with the rest of the functionality of the main.rs (I think they don't even share any command line flags).

Why not make them as two separate binaries (e.g. src/bin/import.ts and src/bin/export.rs), each with their own configuration?

If you think this is good idea, but want to get this PR in asap, we can do it in a separate PR (I have some free cycles, so I can do it). What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The team has discussed this in the past and the conclusion has came to including utility sub-commands in the main binary as then you always know the utility's work with your binary version. It is also less tedious to use them. Also most Execution Clients I read follow this practice. Multiple binaries feels like a lot of work both from a users perspective and developer's perspective

if let Some(command) = trin_execution_config.command {
match command {
TrinExecutionSubCommands::ImportState(import_state) => {
let mut state_importer = StateImporter::new(trin_execution, import_state);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird to pass trin_executable to it. I understand that we have some useful objects there (db), but it just looks weird. If anything, it should return TrinExecution object (but that doesn't make sense if we just finish after is done).

let mut era2 = Era2::create(self.exporter_config.path_to_era2.clone(), header)?;
info!("Era2 initiated");
info!("Trie leaf iterator initiated");
let mut accounts_processed = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe accounts_exported?

vec![]
};

let mut storage: Vec<StorageItem> = vec![];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't know if this is big problem, but I don't like that we are storing all storage items in memory, instead of exporting them to file. Don't know how big storage can get and if this is an issue (at least on weaker machines).

I know that this is needed in order to write storage_count in the AccountEntry first. The possible solutions would be:

  1. iterate entire trie once and count number of items, write AccountEntry, iterate second time and write as you go
    • considering that reading from db is most likely the slowest part, this would make export twice as slow
  2. Start processing as we do now.
    1. if we finish processing entire storage trie before getting to MAX_STORAGE_ITEMS, do as we do now (in which case, there will be only one StorageEntry)
    2. If we process MAX_STORAGE_ITEMS, stop, and do some variation of 1. from above.
    • this approach gives us best performance in majority of cases (single StorageEntry), but still handles big storage tries. but it's not elegant, and questionable how much it would actually slow things down

Overall, I think current implementation is good for now, and if we notice issue with later (bigger) blocks, we can evaluate if improvement is needed. So no action is expected here (except if you want to create an issue to document this and provide your opinion).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If slower machines can't even run an execution client, I am not sure why they should be able to generate a .era2 file.

They should minimum have 16GB of ram, and if that is the case there should be no problem, and if they need more they can add a swap file.

This isn't an issue, creating .era2 files is already

  • expensive
  • slow
  • niche

99% of people won't be generating these files, they will be using the files we generated.

So I don't think we should optimize this for the lowest case, they can use a .era2 file we generated or use a functional machine.


// Build storage trie
let account_db = AccountDB::new(address_hash, self.trin_execution.database.db.clone());
let mut account_trie = EthTrie::new(Arc::new(account_db));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be storage_trie

{
account_trie
.insert(storage_index_hash.as_slice(), &alloy_rlp::encode(value))?;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call account_trie.root_hash()? once per StorageEntry, in order to commit change to db and lower RAM usage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


// Insert contract if available
if !bytecode.is_empty() && account_state.code_hash != KECCAK_EMPTY {
ensure!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With this check being inside if, it's possible that we will accept invalid era2 file, for example: bytecode provided with this account is empty but code_hash is not (and serious attack vector if the same code_hash is hash of bytecode of some other account).

Therefore, I think it's better to move this check before if statement. And with this check present, we don't need both conditions in the if statement (!bytes.is_empty() is enough).

let mut era_manager = EraManager::new(first_block_hash_to_add).await?;
for block_number in first_block_hash_to_add..self.trin_execution.next_block_number() {
let block = era_manager.get_next_block().await?;
ensure!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not sure how I feel about this check. On one side, it's good to double check, on the other side, we rely on EraManager returning blocks in consecutive order in other places.

If it was me, I would have trusted that EraManager does the correct job, and just do something like this:

while era_manager.next_block_number() < self.trin_execution.next_block_number() {
    let block = era_manager.get_next_block().await?;
    self.trin_execution.database.db.put(...);
}

Up to you.

@KolbyML KolbyML merged commit 62dd518 into ethereum:master Sep 14, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants