Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist conversion state to db and use an LRU cache for active transition states #375

Merged
merged 14 commits into from
Feb 22, 2024

Conversation

gballet
Copy link
Owner

@gballet gballet commented Feb 10, 2024

Some issue appeared in the shadowfork branch, due to the fact that when, for whatever reason, a new database is created, it will check if verkle was activated at genesis as a heuristic to determine if the transition has already happened. If so, it will declare ended as true.

When a block is inserted through the execution engine api, such a database is created, and it seems that it determines the activation of verkle the wrong way. This is because it uses the current time in order to see if verkle is active at genesis. This works at genesis, but if the database is created AFTER the verkle activation, it will assume that the transition has already completed. This creates a weird situation in which the miner produces a block whose conversion has started (and potentially completed) but the insertion into the blockchain when the produced block is passed via the engine api, says it's already completed, and thus it doesn't stop.

This is only one of many issues that stem from the fact that there is no disk storage of the conversion state, and therefore that it needs to either be centralized or "guessed". For example, it is impossible to stop geth during the conversion, to be resumed later. Another issue, not completely related, is that all the states were stored in a map that keeps growing.

In order to address all these problems, the conversion state at each block is now written to disk, and then put in an LRU list. If the conversion state for a given block root is not found in the LRU list, the database is searched for it. If it's not there, then the assumption that we are looking at a fresh database, and so the heuristc "was verkle active at genesis" is correct to determine if the conversion has ended.

There is still the question of whether there should be a way not to use the conversion state once the conversion has finalized. This is not a big problem anymore, as the LRU prevents the list from growing indefinitely. Nonetheless, it would be nice to cover than in a future PR.

This PR also includes a few "riders":

  • adding a CI test to perform the conversion on a simple network
  • adds a lock for the transition in order to avoid concurrency issues between the engine api handler and the miner.

core/state/database.go Outdated Show resolved Hide resolved
@gballet
Copy link
Owner Author

gballet commented Feb 10, 2024

Some issue in replay:

ERROR[02-10|20:47:37.163] failed to encode transition state        err="rlp: type int64 is not RLP-serializable (struct field state.TransitionState.CurrentPreimageOffset)"

And if the encoding isn't working, then it doesn't save it even in the LRU.

core/state/database.go Outdated Show resolved Hide resolved
core/state/database.go Outdated Show resolved Hide resolved
@gballet gballet marked this pull request as ready for review February 11, 2024 09:41
core/overlay/conversion.go Outdated Show resolved Hide resolved
core/overlay/conversion.go Outdated Show resolved Hide resolved
core/overlay/conversion.go Outdated Show resolved Hide resolved
core/overlay/conversion.go Outdated Show resolved Hide resolved
core/state/database.go Outdated Show resolved Hide resolved
core/state/database.go Outdated Show resolved Hide resolved
core/state/database.go Outdated Show resolved Hide resolved
core/state_processor.go Outdated Show resolved Hide resolved
@gballet gballet merged commit 31f522b into transition-post-genesis-rebase-1 Feb 22, 2024
1 check passed
gballet added a commit that referenced this pull request Feb 28, 2024
…tion states (#375)

* persist conversion state to db

* fix: don't recreate LRU when writing state

* opt: only write state to db if not already present in LRU

* fix: rlp can't encode TransitionState

* fix: use gob because binary.Write does not support structs 🤦‍♂️

* fix: nil pointer panic

* add logs to debug shadowfork

* no such thing as not enough traces

* ditto

* fix stupid bug

* add a comment for readability

* add more traces

* Lock the state transition during conversion (#384)

* heavy handed approach: lock the state transition during conversion

* also lock transition state loading/unloading

* reduce logs verbosity

* add conversion test to workflow (#386)

* add conversion test to workflow

* mandatory -f switch fix in rm

* forgot & at the end of the geth command

* remove incorrect kill

* add debug traces

* add an overlay stride

* fix typo

* Apply suggestions from code review
gballet added a commit that referenced this pull request Mar 26, 2024
…tion states (#375)

* persist conversion state to db

* fix: don't recreate LRU when writing state

* opt: only write state to db if not already present in LRU

* fix: rlp can't encode TransitionState

* fix: use gob because binary.Write does not support structs 🤦‍♂️

* fix: nil pointer panic

* add logs to debug shadowfork

* no such thing as not enough traces

* ditto

* fix stupid bug

* add a comment for readability

* add more traces

* Lock the state transition during conversion (#384)

* heavy handed approach: lock the state transition during conversion

* also lock transition state loading/unloading

* reduce logs verbosity

* add conversion test to workflow (#386)

* add conversion test to workflow

* mandatory -f switch fix in rm

* forgot & at the end of the geth command

* remove incorrect kill

* add debug traces

* add an overlay stride

* fix typo

* Apply suggestions from code review
gballet added a commit that referenced this pull request Apr 15, 2024
…tion states (#375)

* persist conversion state to db

* fix: don't recreate LRU when writing state

* opt: only write state to db if not already present in LRU

* fix: rlp can't encode TransitionState

* fix: use gob because binary.Write does not support structs 🤦‍♂️

* fix: nil pointer panic

* add logs to debug shadowfork

* no such thing as not enough traces

* ditto

* fix stupid bug

* add a comment for readability

* add more traces

* Lock the state transition during conversion (#384)

* heavy handed approach: lock the state transition during conversion

* also lock transition state loading/unloading

* reduce logs verbosity

* add conversion test to workflow (#386)

* add conversion test to workflow

* mandatory -f switch fix in rm

* forgot & at the end of the geth command

* remove incorrect kill

* add debug traces

* add an overlay stride

* fix typo

* Apply suggestions from code review
gballet added a commit that referenced this pull request Apr 29, 2024
quell linter issue

support Transition post tree in conversion

Refactor transition post genesis (#311)

* rewrite per-block conversion pointer management

* remove unused method

* fix: a branch that can verge at genesis or post genesis (#314)

* fix: import cycle in conversion refactor (#315)

* fix shadowfork panic in OpenStorageTrie

fix OpenStorageTrie: return an error instead of panicking if the account tree not a verkle tree (#321)

add a switch to force proof in blocks (#322)

* add a switch to force proof in blocks

* activate switch

* fix switch type

add switch to override the stride of the overlay conversion (#323)

* add switch to override the stride of the overlay conversion

* set a default stride of 10k

add a few traces for relaunch

add missing step in the chain of setting the override for overlay stride

fix: save and load transition state for block processing (#324)

* fix: save and load transition state for block processing

* log whether the tree is verkle in LoadTransitionState

* fix: ensure the transition is marked as started in insertChain

* dump saved address

* fix nil pointer panic

* remove stacktrace that is no longer useful

* fix a panic

* fix build

* check: copy current account address BEFORE it's saved

* mandatory panic fix

* Remove debug fmt.Println

* more cleanup + comments

fix: ensure StorageProcessed is properly recovered

add traces for gas pool overflow

cmd/{geth, utils}: add a command to clear verkle costs (#326)

* cmd/{geth, utils}: add a command to clear verkle costs

fix:  boolean issue

fix: load finalization state in FinalizeAndAssemble (#340)

* Conversion and TransitionTrie fixes (#346)

* fixes

Signed-off-by: Ignacio Hagopian <[email protected]>

* remove old comment

Signed-off-by: Ignacio Hagopian <[email protected]>

---------

Signed-off-by: Ignacio Hagopian <[email protected]>

* trace cleanup

---------

Signed-off-by: Ignacio Hagopian <[email protected]>
Co-authored-by: Ignacio Hagopian <[email protected]>

fix: check for nil override in NewBlockChain (#357)

fix rebase issue

fix a few rebase issues

add debug traces

fix: assume that having to create a new transaction state mean it hasn't happened.

This is a workaround, because it will always be false when restarting a node
that has started the transition but not necessarily completed it.

add logs to debug shadowfork

persist conversion state to db and use an LRU cache for active transition states (#375)

* persist conversion state to db

* fix: don't recreate LRU when writing state

* opt: only write state to db if not already present in LRU

* fix: rlp can't encode TransitionState

* fix: use gob because binary.Write does not support structs 🤦‍♂️

* fix: nil pointer panic

* add logs to debug shadowfork

* no such thing as not enough traces

* ditto

* fix stupid bug

* add a comment for readability

* add more traces

* Lock the state transition during conversion (#384)

* heavy handed approach: lock the state transition during conversion

* also lock transition state loading/unloading

* reduce logs verbosity

* add conversion test to workflow (#386)

* add conversion test to workflow

* mandatory -f switch fix in rm

* forgot & at the end of the geth command

* remove incorrect kill

* add debug traces

* add an overlay stride

* fix typo

* Apply suggestions from code review

eth: add debug_conversionStatus RPC call (#392)

* eth: add debug_conversionStatus RPC call

* add debug trace4s

* Apply suggestions from code review

* export started/ended fields

fix post-verge sync (#404)

* fix post-verge sync

* review: fix truncated comment
gballet added a commit that referenced this pull request Apr 29, 2024
quell linter issue

support Transition post tree in conversion

Refactor transition post genesis (#311)

* rewrite per-block conversion pointer management

* remove unused method

* fix: a branch that can verge at genesis or post genesis (#314)

* fix: import cycle in conversion refactor (#315)

* fix shadowfork panic in OpenStorageTrie

fix OpenStorageTrie: return an error instead of panicking if the account tree not a verkle tree (#321)

add a switch to force proof in blocks (#322)

* add a switch to force proof in blocks

* activate switch

* fix switch type

add switch to override the stride of the overlay conversion (#323)

* add switch to override the stride of the overlay conversion

* set a default stride of 10k

add a few traces for relaunch

add missing step in the chain of setting the override for overlay stride

fix: save and load transition state for block processing (#324)

* fix: save and load transition state for block processing

* log whether the tree is verkle in LoadTransitionState

* fix: ensure the transition is marked as started in insertChain

* dump saved address

* fix nil pointer panic

* remove stacktrace that is no longer useful

* fix a panic

* fix build

* check: copy current account address BEFORE it's saved

* mandatory panic fix

* Remove debug fmt.Println

* more cleanup + comments

fix: ensure StorageProcessed is properly recovered

add traces for gas pool overflow

cmd/{geth, utils}: add a command to clear verkle costs (#326)

* cmd/{geth, utils}: add a command to clear verkle costs

fix:  boolean issue

fix: load finalization state in FinalizeAndAssemble (#340)

* Conversion and TransitionTrie fixes (#346)

* fixes

Signed-off-by: Ignacio Hagopian <[email protected]>

* remove old comment

Signed-off-by: Ignacio Hagopian <[email protected]>

---------

Signed-off-by: Ignacio Hagopian <[email protected]>

* trace cleanup

---------

Signed-off-by: Ignacio Hagopian <[email protected]>
Co-authored-by: Ignacio Hagopian <[email protected]>

fix: check for nil override in NewBlockChain (#357)

fix rebase issue

fix a few rebase issues

add debug traces

fix: assume that having to create a new transaction state mean it hasn't happened.

This is a workaround, because it will always be false when restarting a node
that has started the transition but not necessarily completed it.

add logs to debug shadowfork

persist conversion state to db and use an LRU cache for active transition states (#375)

* persist conversion state to db

* fix: don't recreate LRU when writing state

* opt: only write state to db if not already present in LRU

* fix: rlp can't encode TransitionState

* fix: use gob because binary.Write does not support structs 🤦‍♂️

* fix: nil pointer panic

* add logs to debug shadowfork

* no such thing as not enough traces

* ditto

* fix stupid bug

* add a comment for readability

* add more traces

* Lock the state transition during conversion (#384)

* heavy handed approach: lock the state transition during conversion

* also lock transition state loading/unloading

* reduce logs verbosity

* add conversion test to workflow (#386)

* add conversion test to workflow

* mandatory -f switch fix in rm

* forgot & at the end of the geth command

* remove incorrect kill

* add debug traces

* add an overlay stride

* fix typo

* Apply suggestions from code review

eth: add debug_conversionStatus RPC call (#392)

* eth: add debug_conversionStatus RPC call

* add debug trace4s

* Apply suggestions from code review

* export started/ended fields

fix post-verge sync (#404)

* fix post-verge sync

* review: fix truncated comment
@gballet gballet mentioned this pull request Apr 29, 2024
gballet added a commit that referenced this pull request May 2, 2024
quell linter issue

support Transition post tree in conversion

Refactor transition post genesis (#311)

* rewrite per-block conversion pointer management

* remove unused method

* fix: a branch that can verge at genesis or post genesis (#314)

* fix: import cycle in conversion refactor (#315)

* fix shadowfork panic in OpenStorageTrie

fix OpenStorageTrie: return an error instead of panicking if the account tree not a verkle tree (#321)

add a switch to force proof in blocks (#322)

* add a switch to force proof in blocks

* activate switch

* fix switch type

add switch to override the stride of the overlay conversion (#323)

* add switch to override the stride of the overlay conversion

* set a default stride of 10k

add a few traces for relaunch

add missing step in the chain of setting the override for overlay stride

fix: save and load transition state for block processing (#324)

* fix: save and load transition state for block processing

* log whether the tree is verkle in LoadTransitionState

* fix: ensure the transition is marked as started in insertChain

* dump saved address

* fix nil pointer panic

* remove stacktrace that is no longer useful

* fix a panic

* fix build

* check: copy current account address BEFORE it's saved

* mandatory panic fix

* Remove debug fmt.Println

* more cleanup + comments

fix: ensure StorageProcessed is properly recovered

add traces for gas pool overflow

cmd/{geth, utils}: add a command to clear verkle costs (#326)

* cmd/{geth, utils}: add a command to clear verkle costs

fix:  boolean issue

fix: load finalization state in FinalizeAndAssemble (#340)

* Conversion and TransitionTrie fixes (#346)

* fixes

Signed-off-by: Ignacio Hagopian <[email protected]>

* remove old comment

Signed-off-by: Ignacio Hagopian <[email protected]>

---------

Signed-off-by: Ignacio Hagopian <[email protected]>

* trace cleanup

---------

Signed-off-by: Ignacio Hagopian <[email protected]>
Co-authored-by: Ignacio Hagopian <[email protected]>

fix: check for nil override in NewBlockChain (#357)

fix rebase issue

fix a few rebase issues

add debug traces

fix: assume that having to create a new transaction state mean it hasn't happened.

This is a workaround, because it will always be false when restarting a node
that has started the transition but not necessarily completed it.

add logs to debug shadowfork

persist conversion state to db and use an LRU cache for active transition states (#375)

* persist conversion state to db

* fix: don't recreate LRU when writing state

* opt: only write state to db if not already present in LRU

* fix: rlp can't encode TransitionState

* fix: use gob because binary.Write does not support structs 🤦‍♂️

* fix: nil pointer panic

* add logs to debug shadowfork

* no such thing as not enough traces

* ditto

* fix stupid bug

* add a comment for readability

* add more traces

* Lock the state transition during conversion (#384)

* heavy handed approach: lock the state transition during conversion

* also lock transition state loading/unloading

* reduce logs verbosity

* add conversion test to workflow (#386)

* add conversion test to workflow

* mandatory -f switch fix in rm

* forgot & at the end of the geth command

* remove incorrect kill

* add debug traces

* add an overlay stride

* fix typo

* Apply suggestions from code review

eth: add debug_conversionStatus RPC call (#392)

* eth: add debug_conversionStatus RPC call

* add debug trace4s

* Apply suggestions from code review

* export started/ended fields

fix post-verge sync (#404)

* fix post-verge sync

* review: fix truncated comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant