-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deep reorgs #16594
Fix deep reorgs #16594
Conversation
ef40f13
to
1386e48
Compare
Pull Request Test Coverage Report for Build 6866521658Warning: This coverage report may be inaccurate.We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
💛 - Coveralls |
1386e48
to
4153352
Compare
92cc7c8
to
124c396
Compare
124c396
to
db8a294
Compare
8925f25
to
20ca793
Compare
20ca793
to
de1d389
Compare
We have completed a sync-from-scratch test on mainnet with this patch |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aok
…ious generators in test chains. Use a longer chain, with block generators, in the long_reorg test add new test block chain fixture for reorgs with lighter weight blocks, this covers reorgs to greater block heights than the current peak
…n block, to exercise more parts of our validation logic in simulations
…nt_in_chain(), but return the whole sequence of block hashes
…e database, avoiding parsing and constructing a whole BlockRecord object for block traversed
…ss block validations (instead of re-computing them from scratch for every block)
… generators and block references
…y, we don't get here
8dae620
to
e277a4f
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aok
This PR should be reviewed one commit at a time.
Purpose:
The node is currently having a hard time with deep reorgs. The issue can be split in 2 parts:
Some parts of block validation require some previous blocks to be loaded in the block record cache in the blockchain object. When performing a reorg deeper than the cache size, those block records may not be in the cache and fail.
When validating coin spends on the forked chain, we re-run blocks in order to compute the additions and removals for the fork of the chain. This is done very inefficiently, causing the same blocks to be run many times.
When validating blocks on the fork of the chain (before accepting it and updating our peak) we need to be able to pick up block records (and generators) from blocks that are not in the main chain. This requires looking them up from the database. One case of interest, that has previously not been covered by a test, is when a block reference points into the forked chain (i.e. a block not in the main chain).
Current Behavior:
In a long/deep reorg, the node may fail with
KeyError
while looking up block records that are not in the cache.New Behavior:
Long/deep reorgs work as expected.
Changes
These are all the commits in this PR.
test block chains, BlockTools updates
Update
BlockTools
to make reorg test blockchains use block references and transactions. These updated test chains have been put in thetest-cache
repo and released as0.38.0
.add feature to BlockTools to allow including block references to previous generators in test chains. Use a longer chain, with block generators, in the long_reorg test add new test block chain fixture for reorgs with lighter weight blocks, this covers reorgs to greater block heights than the current peak
add feature to BlockTools to include transactions in every transaction block, to exercise more parts of our validation logic in simulations
fall back to pulling block records from DB
Allow block validation to pull blocks from the database, that currently only pull from the cache.
Preserve fork state across blocks, when validating
Preserve the state of the fork across block validations. This is done in
ForkInfo
, by preserving all additions and removals of the fork. The "fix reorg performance" commit is the big one that passes around information about the fork that's currently being validated (if any).Update and add new tests
Extend the existing long reorg test to be even longer (to be deeper than the block cache size).
Also add tests on the
FullNode
object including one with two nodes syncing via the network protocol.make the timeout for collecting 3 peaks configurable. Set it to 1 seconds in the simulation tests to avoid waiting on CI(already addressed by Sync no farmer #16698 and make sure we set max_sync_wait to 0 in all tests #16771 )Benchmark
I profiled the new reorg logic via
test_long_reorg_nodes()
and theBlockchain.py
test_long_reorg()
tests.test_long_reorg_nodes()
test_long_reorg()
future improvements
get_block_generator()
There are still some significant performance improvements to be made to the
get_block_generator()
function. If the fork chain's height-to-hash map would be recorded in theForkInfo
object, it could avoid the (expensive) call tolookup_fork_chain()
entirely._reconsider_peak()
This function pulls in all full blocks of the fork chain in order to apply them one at a time. At this point we have access to the
ForkInfo
object, so we could already know the block hashes to apply. Then we could load them one at a time from the database, and apply them one at a time.It appears we mainly build the
blocks_to_add
up front to be able to reverse the order of blocks. However, since we also have all the additions and removals, we don't technically need to re-run the blocks either at this point. But we would need to record the height for removals in that case.The loop to fetch the full blocks and block records from the DB is bottlenecked on the parsing, and DB access.
There's some work started to address this:
#16787
#16793
protocol error when syncing
This test:
tests/core/full_node/test_full_node.py::test_long_reorg_nodes
exposes an issue where a node attempts to fetch a block at a height that doesn't exist. This happens when a node is exposed to a heavier chain with a lower peak height. It seems like we try to request blocks at our peak height, rather than the peer's peak height.This is visible in the log as:
This is addressed by: #16779
overlapping block requests
When syncing, it appears we make requests for overlapping block ranges. When this is printed to the log:
Blocks 32, 64, 96 are requested and added twice from the peer. This seems to be a mistake.
This is addressed by: #16792
throwing
get_peak()
In the test we wait for all nodes to have the same peak, indicating they're synced. Sometimes
get_peak()
throws:This is addressed by: #16776
invalid weight proof in testWhen enabling the shorter chains intests/core/full_node/test_full_node.py::test_long_reorg_nodes
, some combinations fail with the following error log:followed by:It would be good to understand why.