Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for re-running migration script on same destination db #246

Merged
merged 2 commits into from
Oct 3, 2024

Conversation

alecps
Copy link

@alecps alecps commented Oct 3, 2024

  • Adds --reset flag to delete everything in the destination directory except for /ancients. This is useful as a fallback if the destination db is failing to mirror the source db after the rsync command is executed.
  • Adds --checksum option to rsync command to ensure that file contents (rather than just size and timestamps) are used to determine which files to transfer. This provides higher certainty that the destination directory will mirror the source directory after the transfer.

@alecps alecps requested a review from piersy October 3, 2024 17:11
@piersy
Copy link

piersy commented Oct 3, 2024

So I see you've added the checksum flag to the rsync command and also a flag to delete the non ancient data. Is the checksum flag addition not working on it's own? Because it would be much better if we didn't need to re-copy the non ancient data as it will take at least a few minutes. @alecps

@alecps
Copy link
Author

alecps commented Oct 3, 2024

@piersy To be honest I haven't been able to reproduce the issue you saw where there were multiple blocks at the same height. If you have any pointers as to how I might be able to that would be helpful. The --checksum flag appears to be working in my local tests and also seems like a good idea in general given our use case. The --reset flag I added is more like a fallback in case we ever encounter an issue (similar to the one we saw last week) where a partner needs to try re-running the migration script to get the correct migration block hash etc.. Trying once with the --reset flag will be relatively quick and as long as the problem is not related to the ancients db (which it likely wouldn't be) this would save many hours by not deleting and re-migrating all the ancient blocks, which could always be our next recommendation if --reset doesn't work.

The --reset flag won't be the default behavior, and will just be for troubleshooting if the migration is not successful the first time, so we won't be re-copying all the non ancient data in the happy case. Based on my research and testing the rsync flag really should be enough on it's own, but given that there might be reasons particular to certain setups (network connectivity, file permissions, file system limits etc.) that maybe rsync might fail to behave as we expect, and that we've seen an instance where this functionality would've been useful, I think it makes sense to add as a fallback

@piersy
Copy link

piersy commented Oct 3, 2024

@piersy To be honest I haven't been able to reproduce the issue you saw where there were multiple blocks at the same height. If you have any pointers as to how I might be able to that would be helpful. The --checksum flag appears to be working in my local tests and also seems like a good idea in general given our use case. The --reset flag I added is more like a fallback in case we ever encounter an issue (similar to the one we saw last week) where a partner needs to try re-running the migration script to get the correct migration block hash etc.. Trying once with the --reset flag will be relatively quick and as long as the problem is not related to the ancients db (which it likely wouldn't be) this would save many hours by not deleting and re-migrating all the ancient blocks, which could always be our next recommendation if --reset doesn't work.

Got you @alecps, there are some conflicts to resolve though

@alecps
Copy link
Author

alecps commented Oct 3, 2024

@piersy Another option here if we don't want to take any risks with the rsync command would be to move it from the pre-migration step to the full-migration step and make --reset the default behavior so that everything except /ancients is always empty. This would slow down the full migration step though. I can do some testing to get a sense of how much

Copy link

@piersy piersy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

@piersy
Copy link

piersy commented Oct 3, 2024

@piersy Another option here if we don't want to take any risks with the rsync command would be to move it from the pre-migration step to the full-migration step and make --reset the default behavior so that everything except /ancients is always empty. This would slow down the full migration step though. I can do some testing to get a sense of how much

I think it's good as it is, pre-migrate followed by full migrate has worked for us in the past, so lets stick to that as it keeps the migration time low. And if we run into problems we always have --reset.

@alecps alecps merged commit f5ddd80 into celo9 Oct 3, 2024
52 of 53 checks passed
@alecps alecps deleted the alecps/resetNonAncients branch October 3, 2024 19:01
karlb pushed a commit that referenced this pull request Oct 12, 2024
* add reset flag

* add --checksum to rsync options
karlb pushed a commit that referenced this pull request Oct 14, 2024
* add reset flag

* add --checksum to rsync options
karlb pushed a commit that referenced this pull request Oct 14, 2024
* add reset flag

* add --checksum to rsync options
karlb pushed a commit that referenced this pull request Oct 14, 2024
* add reset flag

* add --checksum to rsync options
alecps pushed a commit that referenced this pull request Oct 15, 2024
This works by loading the database of a celo
node. It then removes all existing blocks and
generates a new genesis block including the
existing state tree.

Migrate to urfave/cli/v2

Update op-chain-ops/cmd/op-migrate/main.go

Co-authored-by: Karl Bartel <[email protected]>

Combine Cel2 migration scripts (#148)

* Initial script to play with celo DB history migration

* Can Read All the headers

Co-authored-by: Alec Schaefer <[email protected]>

* Adds new command to migrate ancients db

* Adds comment

* Adds extension methods for transformation

* Implements Transform CeloBody

* Adds impl that runs steps in a concurrent pipeline

* Adds transformHead, verify hashing works

cleanup

* add migration for non-frozen blocks

* copy over entire db and modify in place, works with op-geth at piersy/minimal-data-migration

* remove unecessary copying, cleanup code

* close and reopen DBs

* migrate newdb in place

* saving progress

Co-authored-by: Mariano Cortesi <[email protected]>

* Refactor code to improve database migration process

* better logging

* refactor: inline parMigrateAncientRange

* Remove frozen blocks from nonAncient DB

* check hash matches on nonAncients migration

* clean up branch

Removes unused code, move code for better separation of concerns.

* decode into new types

* fix transformHeader

* make old freezer not readonly so that .meta files are created

* add configurable memory limit

* add comment about memory

* Added celo-dbmigrate Makefile target

* Added dockerfile for celo-dbmigrate and celo-migrate tools

* Workflow for running cel2-migration-tool

* Update cel2-migration-tool image registry

* update op-geth to point to https://github.com/celo-org/op-geth/commits/piersy/for-use-with-migrated-celo-datadir-use-gas-limit-differentiation-rebased-celo6/

* add celo6 logging

* rename scripts to celo-migrate-state and celo-migrate-blocks

* first pass at combining scripts

* saving progress on testing

* fix lint error, use %w to fmt errors

* add updated state migration input files to testdata

* add ability to run block and state migration seperately or together

* add option for migrating only frozen blocks

* remove old scripts

* minor logging improvements in block migrations

* invert clearNonAncients flag logic --> keepNonAncients, make dry-run flag only apply to state migration

* adds README, improves logging

* fix lint err

* Fix Makefile and Dockerfile

* move createNewDbIfNotExists

* rename keep-non-ancients

* update TODO to add more context and state changes

* Remove channel buffers from ancients migration

Co-authored-by: Valentin Rodygin <[email protected]>

* bump default batch size to 100000

* add back extended usage string

* add info on state migration to README

* remove --state-dry-run flag

* update default batch size to 50k

* Adding building for op images

* Setting our values for image registry and repository

* update README

* fix logging when newAncients > oldAncients

* fix return value when skipping ancients

* skip transforming block bodies that have already been transformed

* misc. fixes to get re-runs with --keep-non-ancients working

* adds TODO

* addresses cosmetic feedback

* add flag for specifying a buffer

* Show progress on rsync

* Update to latest op-geth

* state-migration: Refactor subtask

* state-migration: Use EIP1559 settings from deploy config

Fixes #135

* state-migration: Enable Fjord hardfork during migration

Fixes #160

* state-migration: Deterministicly set migration block timestamp

Fixes #157

Sets the timestamp to be 5s after the last block.

* state-migration: Set WithdrawalsHash in Cel2 migration block

* fixup! Fix Makefile and Dockerfile

* add note to README about using snapshots for pre-migration

* Set blob gas header fields for transition block

These are now required to be set since cancun was activated.

* Use InitialBaseFee for pre-gingerbread transitionb

* Fix warnings about capitalized error strings

* Output chain config as marshalled JSON

* state-migration: Handle accounts with existing balance

Fixes #158

* remove allocs file, add instructions for how to generate allocs file to README, update TODOs

---------

Co-authored-by: Mariano Cortesi <[email protected]>
Co-authored-by: Alec Schaefer <[email protected]>
Co-authored-by: Mariano Cortesi <[email protected]>
Co-authored-by: Javier Cortejoso <[email protected]>
Co-authored-by: Paul Lange <[email protected]>
Co-authored-by: Valentin Rodygin <[email protected]>
Co-authored-by: Piers Powlesland <[email protected]>

Set balance of `CeloDistributionSchedule` contract (#162)

* state-migration: Initialize CeloDistributionSchedule

Fixes #155

* state-migration: Don't fail when distribution schedule update errors

* Review comments

state-migration: Set ParentBeaconRoot (#176)

This allows header validation to pass during snap sync

state-migration: Set address of distribution schedule (#177)

state-migration: Read total supply directly from state (#182)

* state-migration: Read totalSupply directly from storage

* Added trigger for updated dependencies

* Removen token bindings

---------

Co-authored-by: Javier Cortejoso <[email protected]>

Fix l2 block older than l1 origin error (#184) (#187)

* Revert to using time.Now() for migration block

Instead of simply adding 5 to the parent block time.

We really do need a deterministic time for the migration block so that
all parties that run the migration arrive at the same migration block
but the problem is that op-geth requires that the L2 migration block
(aka l2 origin) occurs after the l1 origin (I guess the point where you
deploy the bridge contracts to the l1). When we migrate a partially
synced datadir the block before the transition block will be very old,
up to 4 years old! So of course it occurs before the l1 origin. So a fix
just to get things working is to use time.Now(), but probably we should
make this a configurable parameter.

* add flag to specify timestamp

* Update op-chain-ops/cmd/celo-migrate/main.go

---------

Co-authored-by: piersy <[email protected]>

Migration script fixes (#179)

* Fixed migration for datadirs without ancients

The script was assuming that ancients would have been migrated and was
considering the numAncients-1 to be the next block to migrate but when
numAncients is zero that's a problem.

Also remved logic for  picking up where db migration left of for the
level db since it was complicating the logic and that process takes a
few seconds, which is nothing compared with the minutes taken to migrate
the ancients.

* Ensure that we set gas limit if migrating at pre-gingerbread point

Fix migration script gap in migrated blocks (#189)

* Fix migration script gap in migrated blocks

The range of ancient blocks to remove from the non ancients database was
off by one and resulted in a gap between ancients and non ancients.

Also corrected some log statements that were off by one.

Add pre-migration command to migration script (#192)

* add pre-migration command, rsync and ancients run in parallel, remove onlyAncients flag

* remove block and state migration sub-commands

* make non ancient migration its own step, add flag to measure time

* add more granular timers

* open db without freezer in state migration, remove clearAll

* fix error

* remove update flag from rsync command, add rsync comments

* delete commented out versions of checkForPrevFullMigration

* remove aliases

* remove clearNonAncients flag

* remove measureTime flag, always log time measurements

* remove logging from help text

* remove db reset

* move scan for extra ancients into pre-migration

* update README

* rename extraAncientNumHashes to strayAncientBlocks

state-migration: Fail if account would be overwritten (#202)

* state-migration: Fail if account would be overwritten

* Review changes

* Review changes 2

* Fail in unclear state

* more changes

* Use whitelist to decide if nonce and state are overwritten

Cosmetic changes to the migration script

- Use more lists for added readability
- Capitalize Alfajores and Celo
- Reorder scripting instructions to fit the actual order or operations
- Use GitHub callouts

migration: Add tests (#217)

* migration: Add tests for state migration

* migration: Fix issues shown by tests

* migration: pass allowlist into state migration

Allows for easier testing

* migration: Add test with allowlist

* Correct overwrite counter

* Use in memory DB

migration: Add working allowlist for Alfajores (#220)

* migration: Simplify tests

* migration: Add working allowlist for Alfajores

Adapt migration code to changes in StateDB

StateDB.CreateAccount used to copy existing balance, now it does not any
more.

migration: Set fields correctly for migration block (#212)

migration: Enable Granite (#226)

Write genesis file in state migration (#219)

* squash of #167

* add writeGenesis

* open old freezer in readonly mode, fix locking error

* remove devAlloc

* Revert "open old freezer in readonly mode, fix locking error"

This reverts commit e3fddea.

* fix locking error

* fix lint error, check errors, add comment

* remove comment

* filter extra genesis fields

* fix issue with genesis extra data

* update testdata

---------

Co-authored-by: Javier Cortejoso <[email protected]>

migration: Overwrite create2deployer code (#233)

migration: Allow 'createx' preinstall (#238)

The code already exists on Alfajores and matches the one that would be
deployed, therefore we just allow this address.

add migration-block-number flag (#245)

* add migration-block-number flag

* address feedback

* move migration-block-number flag out of state migration options

Fixes for re-running migration script on same destination db  (#246)

* add reset flag

* add --checksum to rsync options
karlb pushed a commit that referenced this pull request Oct 16, 2024
This works by loading the database of a celo
node. It then removes all existing blocks and
generates a new genesis block including the
existing state tree.

Migrate to urfave/cli/v2

Update op-chain-ops/cmd/op-migrate/main.go

Co-authored-by: Karl Bartel <[email protected]>

Combine Cel2 migration scripts (#148)

* Initial script to play with celo DB history migration

* Can Read All the headers

Co-authored-by: Alec Schaefer <[email protected]>

* Adds new command to migrate ancients db

* Adds comment

* Adds extension methods for transformation

* Implements Transform CeloBody

* Adds impl that runs steps in a concurrent pipeline

* Adds transformHead, verify hashing works

cleanup

* add migration for non-frozen blocks

* copy over entire db and modify in place, works with op-geth at piersy/minimal-data-migration

* remove unecessary copying, cleanup code

* close and reopen DBs

* migrate newdb in place

* saving progress

Co-authored-by: Mariano Cortesi <[email protected]>

* Refactor code to improve database migration process

* better logging

* refactor: inline parMigrateAncientRange

* Remove frozen blocks from nonAncient DB

* check hash matches on nonAncients migration

* clean up branch

Removes unused code, move code for better separation of concerns.

* decode into new types

* fix transformHeader

* make old freezer not readonly so that .meta files are created

* add configurable memory limit

* add comment about memory

* Added celo-dbmigrate Makefile target

* Added dockerfile for celo-dbmigrate and celo-migrate tools

* Workflow for running cel2-migration-tool

* Update cel2-migration-tool image registry

* update op-geth to point to https://github.com/celo-org/op-geth/commits/piersy/for-use-with-migrated-celo-datadir-use-gas-limit-differentiation-rebased-celo6/

* add celo6 logging

* rename scripts to celo-migrate-state and celo-migrate-blocks

* first pass at combining scripts

* saving progress on testing

* fix lint error, use %w to fmt errors

* add updated state migration input files to testdata

* add ability to run block and state migration seperately or together

* add option for migrating only frozen blocks

* remove old scripts

* minor logging improvements in block migrations

* invert clearNonAncients flag logic --> keepNonAncients, make dry-run flag only apply to state migration

* adds README, improves logging

* fix lint err

* Fix Makefile and Dockerfile

* move createNewDbIfNotExists

* rename keep-non-ancients

* update TODO to add more context and state changes

* Remove channel buffers from ancients migration

Co-authored-by: Valentin Rodygin <[email protected]>

* bump default batch size to 100000

* add back extended usage string

* add info on state migration to README

* remove --state-dry-run flag

* update default batch size to 50k

* Adding building for op images

* Setting our values for image registry and repository

* update README

* fix logging when newAncients > oldAncients

* fix return value when skipping ancients

* skip transforming block bodies that have already been transformed

* misc. fixes to get re-runs with --keep-non-ancients working

* adds TODO

* addresses cosmetic feedback

* add flag for specifying a buffer

* Show progress on rsync

* Update to latest op-geth

* state-migration: Refactor subtask

* state-migration: Use EIP1559 settings from deploy config

Fixes #135

* state-migration: Enable Fjord hardfork during migration

Fixes #160

* state-migration: Deterministicly set migration block timestamp

Fixes #157

Sets the timestamp to be 5s after the last block.

* state-migration: Set WithdrawalsHash in Cel2 migration block

* fixup! Fix Makefile and Dockerfile

* add note to README about using snapshots for pre-migration

* Set blob gas header fields for transition block

These are now required to be set since cancun was activated.

* Use InitialBaseFee for pre-gingerbread transitionb

* Fix warnings about capitalized error strings

* Output chain config as marshalled JSON

* state-migration: Handle accounts with existing balance

Fixes #158

* remove allocs file, add instructions for how to generate allocs file to README, update TODOs

---------

Co-authored-by: Mariano Cortesi <[email protected]>
Co-authored-by: Alec Schaefer <[email protected]>
Co-authored-by: Mariano Cortesi <[email protected]>
Co-authored-by: Javier Cortejoso <[email protected]>
Co-authored-by: Paul Lange <[email protected]>
Co-authored-by: Valentin Rodygin <[email protected]>
Co-authored-by: Piers Powlesland <[email protected]>

Set balance of `CeloDistributionSchedule` contract (#162)

* state-migration: Initialize CeloDistributionSchedule

Fixes #155

* state-migration: Don't fail when distribution schedule update errors

* Review comments

state-migration: Set ParentBeaconRoot (#176)

This allows header validation to pass during snap sync

state-migration: Set address of distribution schedule (#177)

state-migration: Read total supply directly from state (#182)

* state-migration: Read totalSupply directly from storage

* Added trigger for updated dependencies

* Removen token bindings

---------

Co-authored-by: Javier Cortejoso <[email protected]>

Fix l2 block older than l1 origin error (#184) (#187)

* Revert to using time.Now() for migration block

Instead of simply adding 5 to the parent block time.

We really do need a deterministic time for the migration block so that
all parties that run the migration arrive at the same migration block
but the problem is that op-geth requires that the L2 migration block
(aka l2 origin) occurs after the l1 origin (I guess the point where you
deploy the bridge contracts to the l1). When we migrate a partially
synced datadir the block before the transition block will be very old,
up to 4 years old! So of course it occurs before the l1 origin. So a fix
just to get things working is to use time.Now(), but probably we should
make this a configurable parameter.

* add flag to specify timestamp

* Update op-chain-ops/cmd/celo-migrate/main.go

---------

Co-authored-by: piersy <[email protected]>

Migration script fixes (#179)

* Fixed migration for datadirs without ancients

The script was assuming that ancients would have been migrated and was
considering the numAncients-1 to be the next block to migrate but when
numAncients is zero that's a problem.

Also remved logic for  picking up where db migration left of for the
level db since it was complicating the logic and that process takes a
few seconds, which is nothing compared with the minutes taken to migrate
the ancients.

* Ensure that we set gas limit if migrating at pre-gingerbread point

Fix migration script gap in migrated blocks (#189)

* Fix migration script gap in migrated blocks

The range of ancient blocks to remove from the non ancients database was
off by one and resulted in a gap between ancients and non ancients.

Also corrected some log statements that were off by one.

Add pre-migration command to migration script (#192)

* add pre-migration command, rsync and ancients run in parallel, remove onlyAncients flag

* remove block and state migration sub-commands

* make non ancient migration its own step, add flag to measure time

* add more granular timers

* open db without freezer in state migration, remove clearAll

* fix error

* remove update flag from rsync command, add rsync comments

* delete commented out versions of checkForPrevFullMigration

* remove aliases

* remove clearNonAncients flag

* remove measureTime flag, always log time measurements

* remove logging from help text

* remove db reset

* move scan for extra ancients into pre-migration

* update README

* rename extraAncientNumHashes to strayAncientBlocks

state-migration: Fail if account would be overwritten (#202)

* state-migration: Fail if account would be overwritten

* Review changes

* Review changes 2

* Fail in unclear state

* more changes

* Use whitelist to decide if nonce and state are overwritten

Cosmetic changes to the migration script

- Use more lists for added readability
- Capitalize Alfajores and Celo
- Reorder scripting instructions to fit the actual order or operations
- Use GitHub callouts

migration: Add tests (#217)

* migration: Add tests for state migration

* migration: Fix issues shown by tests

* migration: pass allowlist into state migration

Allows for easier testing

* migration: Add test with allowlist

* Correct overwrite counter

* Use in memory DB

migration: Add working allowlist for Alfajores (#220)

* migration: Simplify tests

* migration: Add working allowlist for Alfajores

Adapt migration code to changes in StateDB

StateDB.CreateAccount used to copy existing balance, now it does not any
more.

migration: Set fields correctly for migration block (#212)

migration: Enable Granite (#226)

Write genesis file in state migration (#219)

* squash of #167

* add writeGenesis

* open old freezer in readonly mode, fix locking error

* remove devAlloc

* Revert "open old freezer in readonly mode, fix locking error"

This reverts commit e3fddea.

* fix locking error

* fix lint error, check errors, add comment

* remove comment

* filter extra genesis fields

* fix issue with genesis extra data

* update testdata

---------

Co-authored-by: Javier Cortejoso <[email protected]>

migration: Overwrite create2deployer code (#233)

migration: Allow 'createx' preinstall (#238)

The code already exists on Alfajores and matches the one that would be
deployed, therefore we just allow this address.

add migration-block-number flag (#245)

* add migration-block-number flag

* address feedback

* move migration-block-number flag out of state migration options

Fixes for re-running migration script on same destination db  (#246)

* add reset flag

* add --checksum to rsync options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants