Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: command to dump preimages enumerated in snapshot order, in a flat file #27819

Closed
wants to merge 2 commits into from

Conversation

gballet
Copy link
Member

@gballet gballet commented Jul 31, 2023

This is a rebase of gballet#246 on top of master.

}
exportOverlayPreimagesCommand = &cli.Command{
Action: exportOverlayPreimages,
Name: "export-overlay-preimages",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "export-overlay-preimages",
Name: "export-preimages",

I mean, it's just exporting preimages, right?

@@ -374,6 +374,73 @@ func ExportPreimages(db ethdb.Database, fn string) error {
return nil
}

// ExportOverlayPreimages exports all known hash preimages into the specified file,
// in the same order as expected by the overlay tree migration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify what that "same order" entails

stIt.Release()
}
count++
if count%100000 == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please instead add a log output every 8 seconds or so, Exporting preimages with some stats about how many, how much remaining, how much time elapsed etc.

Comment on lines +382 to +385
fh, err := os.OpenFile(fn, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate? Wouldn't it be nicer if we could abort/resume, somehow?

Comment on lines +405 to +410
for accIt.Next() {
acc, err := types.FullAccount(accIt.Account())
if err != nil {
return fmt.Errorf("invalid account encountered during traversal: %s", err)
}
addr := rawdb.ReadPreimage(statedb.Database().DiskDB(), accIt.Hash())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting. So, your way of doing this is to

  • Iterate the account-snapshot, (ordered by hashed address)
  • For each h(a), read preimage.

An alternative way to do it would be to iterate the premages: in a first phase, out to an external file. In a second phase, that file could be (iteratively) sorted by post-image instead of pre-image.

It would be interesting to see the differences in speed between the two approaches.

A benefit with the second approach is that it wouldn't be sensitive to new data -- if you get an additional chunk of preimages (either from a few more blocks, or some external source), you could just append them to the "unsorted" file, and then re-sort it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now that the order you want is not just "ordered by post-hash", the ordering is "ordered by post-hash account, then the preimages for that account's storage trie preimages". So the final ordering would be

acc1 
acc2
storagekey1-acc2
storagekey2-acc2
acc3

Seems like a pretty strange ordering -- and also highly sensitive to changes in state. If you advance one block, you need to redo everything, because the ordering is state-dependent and cannot be performed given only the data itself.

Comment on lines +418 to +419
if acc.Root == types.EmptyRootHash {
stIt, err := chain.Snapshots().StorageIterator(mptRoot, accIt.Hash(), common.Hash{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems bass-ackwards. No need to create an iterator if the root is the empty root hash. Tho it's faster for sure :)

@holiman
Copy link
Contributor

holiman commented Nov 28, 2023

Superseded by #28256

@holiman holiman closed this Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants