Add Continuity Check Script to celo-migrate #282

alecps · 2024-12-09T20:27:04Z

Adds a standalone command for checking db continuity without performing a migration.

Unit tests are added for the CheckContinuity function

The continuity script can be run in "fail-fast" mode to quickly check whether the db is corrupted, or run normally to print out every gap detected in the data.

Drive by changes:

Adds fixes for a couple minor indexing errors
Typos
Fixes a couple error checks (fixing these types of error checks was part of the audit feedback, but I found a couple that were missed)

Fixes https://github.com/celo-org/celo-blockchain-planning/issues/832

Note: This PR originally added the continuity check logic to the actual migration code path (in addition to having a standalone script), but in an effort to reduce unnecessary changes to the migration logic those changes were reverted and moved to #297. This means that the migration script itself will not detect any gaps in the data being migrated, so the standalone continuity script should be run on the source db before running the migration script.

palango

Looks good, but I wonder if we should have a separate command to check a datadir before the migration, independent of the pre-migration. What do you think?

op-chain-ops/cmd/celo-migrate/ancients.go

op-chain-ops/cmd/celo-migrate/non-ancients.go

alecps · 2024-12-10T21:35:22Z

@palango Interesting idea. I need to think about that a bit more. My first reaction is that it would add an extra step + time to the migration process. I'm also trying to work out if it's even possible to have gaps in ancient blocks, as the ancient db is append only. Having a separate script could be a quick way to bring extra piece of mind. Will give this some more thought tomorrow

palango · 2024-12-11T13:51:35Z

@palango Interesting idea. I need to think about that a bit more. My first reaction is that it would add an extra step + time to the migration process. I'm also trying to work out if it's even possible to have gaps in ancient blocks, as the ancient db is append only. Having a separate script could be a quick way to bring extra piece of mind. Will give this some more thought tomorrow

Not a requirement, but it might be too late for a full resync if people run the pre-migration shortly before the schedules block.

piersy · 2024-12-11T16:10:43Z

My feeling is that a separate command for this is unnecessary, if we want people to check this earlier then we should just message them to run the pre-migration earlier.

op-chain-ops/cmd/celo-migrate/ancients.go

alecps · 2024-12-12T20:25:35Z

I've observed some strange behavior that I want to look into more. It seems there may be an earlier gap that this branch does not detect that is preventing blocks from freezing beyond block 5,002,000. This branch only throws on a later gap that starts at block 5835918. It's possible this is a partial gap, where only receipt data is missing. Will re-open this for review when I've gotten to the bottom of what's going on

op-chain-ops/cmd/celo-migrate/ancients.go

op-chain-ops/cmd/celo-migrate/non-ancients.go

op-chain-ops/cmd/celo-migrate/ancients.go

alecps · 2024-12-18T03:18:19Z

I added checks for all the necessary data (receipts, headers, total difficulty, etc.) for both ancient and non-ancient blocks. The script before only checked for gaps in headers. I haven't been able to test with gaps in ancients because I haven't been able to get a node to freeze blockls beyond the first gap. It's unclear whether it's possible to actually have a gap in the ancient data.

alecps · 2024-12-18T03:20:35Z

@palango After thinking about it more, I do actually think it would be nice to have a command for testing data integrity. Though it isn't strictly necessary, it might be appreciated by anyone who runs into data corruption issues and wants to quickly check whether the database they're using is okay before trying a second time. We could even consider recommending that people run it first as a precaution, but that might not be necessary. I think it might make sense to add in a follow up PR so that this one doesn't get too big. What do you think?

palango · 2024-12-18T11:43:24Z

@palango After thinking about it more, I do actually think it would be nice to have a command for testing data integrity. Though it isn't strictly necessary, it might be appreciated by anyone who runs into data corruption issues and wants to quickly check whether the database they're using is okay before trying a second time. We could even consider recommending that people run it first as a precaution, but that might not be necessary. I think it might make sense to add in a follow up PR so that this one doesn't get too big. What do you think?

Totally agree, this can be added separately. And maybe this doesn't need to be coupled to the migration at all, even though it would make sense to reuse code if possible.

op-chain-ops/cmd/celo-migrate/non-ancients.go

op-chain-ops/cmd/celo-migrate/ancients.go

github-actions · 2025-01-04T01:59:17Z

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

piersy · 2025-01-22T13:56:52Z

op-chain-ops/cmd/celo-migrate/continuity_test.go

+				bodies:   [][]byte{[]byte("body0"), []byte("body1"), []byte("body2"), []byte("body3")},
+				receipts: [][]byte{[]byte("receipt0"), []byte("receipt1"), []byte("receipt2"), []byte("receipt3")},
+				tds:      [][]byte{[]byte("td0"), []byte("td1"), []byte("td2"), []byte("td3")},


Adding all these fields here is verbose and confusing because it makes it look like the test is checking some of these values when it is not.

I suggest using a function like makeRange(start int, hashes, encodedHeaders [][]byte) that ensures that all other fields have the correct length

Got it. Let me know if what I have now looks better to you. I used a makeRange helper to make the tests more concise, but since many of the tests are specifically checking for differences in the lengths of the various other fields, I'm still passing the slices in to the helper for all the fields (not just hashes and encodedHeaders). This ended up looking easier to read in my opinion than passing the desired lengths as int parameters. I can't see how the parameters you suggested above "makeRange(start int, hashes, encodedHeaders [][]byte)" could support the test cases where we differ the lengths of the various fields. Let me know if you had something else in mind!

op-chain-ops/cmd/celo-migrate/continuity.go

piersy · 2025-01-22T14:27:36Z

Hey @alecps, I see what you're trying to do here, but this has become quite huge, 861 lines to check that each block follows it's parent.

The db continuity check runs pretty fast for me 1 minute for alfajores, so I would be in favour of reverting changes to the pre-existing migration code and just adding the db check as a separate command. This gives us the benefit of not having to modify the audited migration code. I don't want to be unilaterally deciding this though so please chime in @karlb & @palango

alecps · 2025-01-22T23:17:10Z

op-chain-ops/cmd/celo-migrate/ancients.go

@@ -216,7 +216,7 @@ func writeAncientBlocks(ctx context.Context, freezer *rawdb.Freezer, in <-chan R
 				return fmt.Errorf("failed to write block range: %w", err)
 			}
 			blockRangeEnd := blockRange.start + uint64(len(blockRange.hashes)) - 1
-			log.Info("Wrote ancient blocks", "start", blockRange.start, "end", blockRangeEnd, "count", len(blockRange.hashes), "remaining", totalAncientBlocks-blockRangeEnd)
+			log.Info("Wrote ancient blocks", "start", blockRange.start, "end", blockRangeEnd, "count", len(blockRange.hashes), "remaining", totalAncientBlocks-(blockRangeEnd+1))


This indexing error was missed by the audit

alecps · 2025-01-22T23:41:19Z

op-chain-ops/cmd/celo-migrate/db.go

@@ -61,7 +92,7 @@ func openDB(chaindataPath string, readOnly bool) (ethdb.Database, error) {

 // Opens a database without access to AncientsDb
 func openDBWithoutFreezer(chaindataPath string, readOnly bool) (ethdb.Database, error) {
-	if _, err := os.Stat(chaindataPath); errors.Is(err, os.ErrNotExist) {
+	if _, err := os.Stat(chaindataPath); err != nil {


Fixing this type of error check was part of the audit feedback. I fixed several others in this PR, but somehow missed this one.

alecps · 2025-01-23T00:00:03Z

@piersy Okay I've reverted all the changes to the migration code path. What remains is just the code for the continuity script, the unit tests for CheckContinuity, a fix for an indexing error, and some other super minor drive-by changes.

I've moved the rest of the changes that weave the continuity checks into the migration logic to alecps/addContinuityCheckToMigration

alecps · 2025-01-23T19:27:31Z

op-chain-ops/cmd/celo-migrate/ancients.go

 	for i := startBlock; i < endBlock; i += batchSize {
-		count := min(batchSize, endBlock-i+1)
+		count := min(batchSize, endBlock-i)


The audit missed this indexing error. It should be endBlock - i because endBlock is not inclusive. This didn't cause any problems before because the freezer.AncientRange function returns at most count elements, and this indexing error could only cause a problem when reading the last section of the ancients (when the number of remaining blocks is less than batchSize)

alecps · 2025-01-23T19:28:35Z

op-chain-ops/cmd/celo-migrate/ancients.go

-		var err error
-
-		blockRange.hashes, err = freezer.AncientRange(rawdb.ChainFreezerHashTable, start, count, 0)
+		blockRange, err := loadAncientRange(freezer, start, count)


@piersy This is the only refactor to the migration code path from the audited code that I haven't reverted. It's straightforward enough that I thought it might be okay to keep, since we need to use loadAncientRange in the continuity script. Happy to revert it though if you think that's best

palango · 2025-01-24T14:48:59Z

op-chain-ops/cmd/celo-migrate/continuity.go

+// Follows checks if the current block has a number one greater than the previous block
+// and if the parent hash of the current block matches the hash of the previous block.
+func (e *RLPBlockElement) Follows(prev *RLPBlockElement) (err error) {
+	if e.Header().Number.Uint64() != prev.Header().Number.Uint64()+1 {


There's 12 instances of .Header().Number.Uint64(), maybe worth putting this in a small function for legibility

alecps requested a review from piersy December 9, 2024 20:27

alecps mentioned this pull request Dec 9, 2024

[Audit - TOB-CELOL2-3] Add error checks to migration script #283

Merged

alecps requested a review from palango December 9, 2024 23:54

palango reviewed Dec 10, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

op-chain-ops/cmd/celo-migrate/non-ancients.go Outdated Show resolved Hide resolved

alecps changed the title ~~check for gaps in block numbers and throw if found during migration~~ Check for gaps in block numbers and throw if found during migration Dec 10, 2024

alecps force-pushed the alecps/migrationBlockGaps branch from 6842c02 to 8c18e28 Compare December 10, 2024 20:07

piersy reviewed Dec 12, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

alecps marked this pull request as draft December 12, 2024 20:22

piersy reviewed Dec 12, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

piersy reviewed Dec 12, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/non-ancients.go Outdated Show resolved Hide resolved

palango reviewed Dec 16, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

alecps force-pushed the alecps/migrationBlockGaps branch from 3bb956f to 66c7b86 Compare December 18, 2024 03:06

alecps marked this pull request as ready for review December 18, 2024 03:07

alecps requested review from piersy and palango December 18, 2024 03:07

piersy reviewed Dec 18, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/non-ancients.go Outdated Show resolved Hide resolved

piersy reviewed Dec 18, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

piersy reviewed Dec 18, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

piersy reviewed Dec 18, 2024

View reviewed changes

op-chain-ops/cmd/celo-migrate/ancients.go Outdated Show resolved Hide resolved

alecps marked this pull request as draft December 19, 2024 22:40

github-actions bot added the Stale label Jan 4, 2025

palango removed the Stale label Jan 6, 2025

alecps requested a review from piersy January 17, 2025 22:52

piersy reviewed Jan 22, 2025

View reviewed changes

op-chain-ops/cmd/celo-migrate/continuity.go Outdated Show resolved Hide resolved

alecps force-pushed the alecps/migrationBlockGaps branch 2 times, most recently from 1715446 to 6aaf1e3 Compare January 22, 2025 23:14

alecps changed the title ~~Check for gaps in block numbers and throw if found during migration~~ Add Continuity Check Script to celo-migrate Jan 22, 2025

alecps commented Jan 22, 2025

View reviewed changes

alecps mentioned this pull request Jan 23, 2025

Add CheckContinuity to migration logic #296

Closed

alecps force-pushed the alecps/migrationBlockGaps branch from fc28d38 to a10e683 Compare January 23, 2025 00:19

alecps mentioned this pull request Jan 23, 2025

[DRAFT] Add continuity check to migration #297

Draft

alecps added 9 commits January 23, 2025 14:17

add CheckContinuity function

5625310

add unit tests for CheckContinuity

5af084f

add loadAncientRange

716f891

add DB key functions

a71d864

add loadNonAncientRange and checkNumberHashes

8b65d1a

add loadLastAncient

fd8c44d

add DB continuity check script

2c2c80f

fix typo / lint err in README

8d2e740

fix indexing error

1e8c9cc

alecps force-pushed the alecps/migrationBlockGaps branch from a10e683 to 1e8c9cc Compare January 23, 2025 19:25

alecps commented Jan 23, 2025

View reviewed changes

alecps added 3 commits January 23, 2025 14:32

fix lint error

91459bc

address feedback on error wording in CheckLengths

6509620

make unit tests more concise

80bfe2c

alecps requested a review from piersy January 23, 2025 21:23

palango reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Continuity Check Script to celo-migrate #282

Add Continuity Check Script to celo-migrate #282

alecps commented Dec 9, 2024 •

edited

Loading

palango left a comment

alecps commented Dec 10, 2024

palango commented Dec 11, 2024

piersy commented Dec 11, 2024

alecps commented Dec 12, 2024

alecps commented Dec 18, 2024 •

edited

Loading

alecps commented Dec 18, 2024 •

edited

Loading

palango commented Dec 18, 2024

github-actions bot commented Jan 4, 2025

piersy Jan 22, 2025

alecps Jan 23, 2025 •

edited

Loading

piersy commented Jan 22, 2025

alecps Jan 22, 2025

alecps Jan 22, 2025 •

edited

Loading

alecps commented Jan 23, 2025 •

edited

Loading

alecps Jan 23, 2025 •

edited

Loading

alecps Jan 23, 2025

palango Jan 24, 2025 •

edited

Loading

Add Continuity Check Script to celo-migrate #282

Are you sure you want to change the base?

Add Continuity Check Script to celo-migrate #282

Conversation

alecps commented Dec 9, 2024 • edited Loading

palango left a comment

Choose a reason for hiding this comment

alecps commented Dec 10, 2024

palango commented Dec 11, 2024

piersy commented Dec 11, 2024

alecps commented Dec 12, 2024

alecps commented Dec 18, 2024 • edited Loading

alecps commented Dec 18, 2024 • edited Loading

palango commented Dec 18, 2024

github-actions bot commented Jan 4, 2025

piersy Jan 22, 2025

Choose a reason for hiding this comment

alecps Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

piersy commented Jan 22, 2025

alecps Jan 22, 2025

Choose a reason for hiding this comment

alecps Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

alecps commented Jan 23, 2025 • edited Loading

alecps Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

alecps Jan 23, 2025

Choose a reason for hiding this comment

palango Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

alecps commented Dec 9, 2024 •

edited

Loading

alecps commented Dec 18, 2024 •

edited

Loading

alecps commented Dec 18, 2024 •

edited

Loading

alecps Jan 23, 2025 •

edited

Loading

alecps Jan 22, 2025 •

edited

Loading

alecps commented Jan 23, 2025 •

edited

Loading

alecps Jan 23, 2025 •

edited

Loading

palango Jan 24, 2025 •

edited

Loading