Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate headers download in initial sync #1589

Merged
merged 3 commits into from
Sep 4, 2017

Conversation

codablock
Copy link

Now that initial block download is delayed until the headers sync is done,
it was noticed that the initial headers sync may happen multiple times in
parallel in the case new blocks are announced. This happens because for
every block in INV that is received, a getheaders message is immediately
sent out resulting in a full download of the headers chain starting from
the point of where the initial headers sync is currently at. This happens
once for each peer that announces the new block. This slows down the
initial headers sync and increases the chance of another block being
announced before it is finished, probably leading to the same behavior
as already described, slowing down the sync even more...and so on.

This commit delays sending of GETHEADERS to later in case the chain is too
far behind while a new block gets announced. Header chains will still be
downloaded multiple times, but the downloading will start much closer
to the tip of the chain, so the damage is not that bad anymore.

This ensures that we get all headers from all peers, even if any of them
is on another chain. This should avoid what happened in
bitcoin#8054
which needed to be reverted later.

This fixes the Bitcoin issue bitcoin#6755

Now that initial block download is delayed until the headers sync is done,
it was noticed that the initial headers sync may happen multiple times in
parallel in the case new blocks are announced. This happens because for
every block in INV that is received, a getheaders message is immediately
sent out resulting in a full download of the headers chain starting from
the point of where the initial headers sync is currently at. This happens
once for each peer that announces the new block. This slows down the
initial headers sync and increases the chance of another block being
announced before it is finished, probably leading to the same behavior
as already described, slowing down the sync even more...and so on.

This commit delays sending of GETHEADERS to later in case the chain is too
far behind while a new block gets announced. Header chains will still be
downloaded multiple times, but the downloading will start much closer
to the tip of the chain, so the damage is not that bad anymore.

This ensures that we get all headers from all peers, even if any of them
is on another chain. This should avoid what happened in
bitcoin#8054
which needed to be reverted later.

This fixes the Bitcoin issue bitcoin#6755
@codablock
Copy link
Author

codablock commented Sep 1, 2017

I added a new commit on top that should fix the failing tests.

There is probably still an issue in case someone got unlucky when the initial headers sync peer is chosen and that one is really much behind. I'm going to test this on testnet as there seem to be many peers lurking around which stopped catching up. I've got an idea how to fix this but I need to verify this first.

@UdjinM6 UdjinM6 added this to the 12.2 milestone Sep 3, 2017
The delaying of GETHEADERS in combination with very old block times in
test cases resulted in the delaying being triggered when the first newly
mined block arrives. This results in a completely stalled sync.

This is fixed by avoiding delaying in when running tests.
Peers which stop sending us headers too early are very likely peers which
did not catch up before and stalled for some reason. We should disconnect
these peers and chose another one to continue.
@codablock
Copy link
Author

codablock commented Sep 3, 2017

I added another commit on top. What I assumed about not catched up peers was correct and could easily be reproduced on testnet.

If your fresh node got unlucky and chose a not catched up peer for the initial headers download, the initial headers download would stall forever (because the headers tip never comes near the current time). As I added delaying of GETHEADERS messages with my initial commit, there would never be a new attempt an downloading a longer chain from another peer. This in turn resulted in the delayed sending of GETHEADERS to be never triggered and thus stall everything.

The solution is to check for finished HEADER chains (if nCount < MAX_HEADERS_RESULTS, including 0) and then disconnect that peer if we did not come close to the tip. This triggers another attempt for the initial headers download, which will start from where the previous one finished.

I was not sure about the disconnecting of the peer, as that is quite hard. But at the same time I think that such a peer should never have tried to send us any headers, that's why GETHEADERS is actually ignored while a node is in IBD.

I'm not sure why there are so many nodes on testnet which are so far behind. In a discussion with @Alex-Werner he mentioned that many 0.12.1 nodes might suffer from an old sync problem with super blocks and thus be stalled forever. Good these nodes were actually there, as otherwise it would have been hard to figure out this new sync problem :)

Some of these nodes might actually also have forked away, because I see some headers syncing action with these after my node has finished initial headers download. This looks like they have foked shorty before they stopped catching up. Not sure if someone should look into excessive forking in testnet.

EDIT: I feel the PR is ready to be merged after some review and testing.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Slightly tested, seems to work as expected - picked two peers who were stuck at different height and they were indeed disconnected so sync continued from a better one (3rd peer).

ACK

@UdjinM6 UdjinM6 merged commit 169afaf into dashpay:v0.12.2.x Sep 4, 2017
@codablock codablock deleted the fix_double_headers branch January 21, 2018 13:12
UdjinM6 added a commit to UdjinM6/dash that referenced this pull request Mar 26, 2018
Reverts "Fix duplicate headers download in initial sync (dashpay#1589)" and all following fixes

This reverts commit 169afaf.
UdjinM6 added a commit to UdjinM6/dash that referenced this pull request Apr 18, 2018
Reverts "Fix duplicate headers download in initial sync (dashpay#1589)" and all following fixes

This reverts commit 169afaf.
UdjinM6 added a commit that referenced this pull request Apr 20, 2018
…handling block inv correctly (#2032)

* Drop custom logic for delaying GETHEADERS

Reverts "Fix duplicate headers download in initial sync (#1589)" and all following fixes

This reverts commit 169afaf.

* Fix duplicate initial headers sync
andvgal pushed a commit to energicryptocurrency/gen2-energi that referenced this pull request Jan 6, 2019
…handling block inv correctly (dashpay#2032)

* Drop custom logic for delaying GETHEADERS

Reverts "Fix duplicate headers download in initial sync (dashpay#1589)" and all following fixes

This reverts commit 169afaf.

* Fix duplicate initial headers sync
PastaPastaPasta added a commit that referenced this pull request Aug 9, 2024
, bitcoin#24178, bitcoin#24171, bitcoin#25404, bitcoin#25514, bitcoin#25720, partial bitcoin#23832, bitcoin#24169, bitcoin#25454 (headers backports)

c92b0f5 merge bitcoin#25720: Reduce bandwidth during initial headers sync when a block is found (Kittywhiskers Van Gogh)
0f9ece0 merge bitcoin#25514: Move CNode::nServices and CNode::nLocalServices to Peer (Kittywhiskers Van Gogh)
c9923ca partial bitcoin#25454: Avoid multiple getheaders messages in flight to the same peer (Kittywhiskers Van Gogh)
26d477b revert: Fix duplicate initial headers sync (Kittywhiskers Van Gogh)
abccb2d test: drop genesis block from `blockheader_testnet3` (Kittywhiskers Van Gogh)
0574a7d merge bitcoin#25404: Use MAX_BLOCKS_TO_ANNOUNCE consistently (Kittywhiskers Van Gogh)
ed871d2 merge bitcoin#24171: Sync chain more readily from inbound peers during IBD (Kittywhiskers Van Gogh)
a04290f merge bitcoin#24178: Respond to getheaders if we have sufficient chainwork (Kittywhiskers Van Gogh)
bcafa28 merge bitcoin#24909: Move and rename pindexBestHeader, fHavePruned (Kittywhiskers Van Gogh)
70485cb partial bitcoin#24169: Add --enable-c++20 option (Kittywhiskers Van Gogh)
27e885d merge bitcoin#23880: Serialize cmpctblock at most once in NewPoWValidBlock (Kittywhiskers Van Gogh)
9f7ac69 merge bitcoin#24024: Remove cs_main lock annotation from ChainstateManager.m_blockman (Kittywhiskers Van Gogh)
9399f90 partial bitcoin#23832: Changes time variables from int to chrono (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * Dependent on #6085

  * Dependency for #6098

  * ~~[bitcoin#25514](bitcoin#25514) removes `peers.spvNodeConnections` and `peers.fullNodeConnections` reporting from `CalculateNumConnectionsChangedStats` as the services flags used to distingush between the two have been moved to the `Peer` struct, accessable only through `PeerManager`.~~

    ~~As `PeerManager` isn't accessable to `CConnman`, even if a new public function was exposed through `PeerManger` (as we have for `IsInvInFilter` and others or we try to access the value through `GetNodeStateStats`), `CConnman` would be unable to leverage it.~~ Resolved with patch by UdjinM6, thanks!

  * [bitcoin#23880](bitcoin#23880) introduces code that is not valid C++20 (but valid C++17) and therefore, required a partial backport of [bitcoin#24169](bitcoin#24169) (fae6790) to make the code C++20 legal.

  * [bitcoin#25454](bitcoin#25454) introduces a 10-point penalty for remitting more than `MAX_BLOCKS_TO_ANNOUNCE` unconnected block headers. `blockheader_testnet3.hex` (introduced in [bitcoin#16551](bitcoin#16551), part of [dash#5963](#5963)), unlike in Bitcoin, includes the genesis block.

    By definition of a genesis block, there is no block before it that connects to, which causes the 10-point penalty to trip and `p2p_dos_header_tree.py` to fail (see below). This has been remedied by removing the genesis block from the test data to match upstream and also because no node has a good reason to ever broadcast the genesis block as-is over P2P.

    <details>

    <summary>Test Failure</summary>

    ```
    dash@6a2649cc721f:/src/dash$ ./test/functional/p2p_dos_header_tree.py
    2024-06-17T17:59:35.874000Z TestFramework (INFO): Initializing test directory /tmp/dash_func_test_h5hfluy1
    2024-06-17T17:59:36.892000Z TestFramework (INFO): Read headers data
    2024-06-17T17:59:36.895000Z TestFramework (INFO): Feed all non-fork headers, including and up to the first checkpoint
    2024-06-17T17:59:38.411000Z TestFramework (ERROR): Assertion failed
    Traceback (most recent call last):
      File "/src/dash/test/functional/test_framework/test_framework.py", line 158, in main
        self.run_test()
      File "./test/functional/p2p_dos_header_tree.py", line 53, in run_test
        assert {
    AssertionError
    2024-06-17T17:59:38.913000Z TestFramework (INFO): Stopping nodes
    2024-06-17T17:59:39.917000Z TestFramework (WARNING): Not cleaning up dir /tmp/dash_func_test_h5hfluy1
    2024-06-17T17:59:39.917000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/dash_func_test_h5hfluy1/test_framework.log
    2024-06-17T17:59:39.917000Z TestFramework (ERROR):
    2024-06-17T17:59:39.917000Z TestFramework (ERROR): Hint: Call /src/dash/test/functional/combine_logs.py '/tmp/dash_func_test_h5hfluy1' to consolidate all logs
    2024-06-17T17:59:39.917000Z TestFramework (ERROR):
    2024-06-17T17:59:39.917000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    2024-06-17T17:59:39.917000Z TestFramework (ERROR): https://github.com/dashpay/dash/issues
    2024-06-17T17:59:39.917000Z TestFramework (ERROR):
    ```

    </details>

  * [bitcoin#25454](bitcoin#25454) has a goal similar to [dash#2032](#2032) (and its predecessor, [dash#1589](#1589)), namely, avoiding `getheaders`(`2`) duplication to the same peer. Unfortunately, Dash's mitigation seems to conflict with Bitcoin's mitigation and this results in `feature_minchainwork.py` failing (see below). This has been remedied by partially reverting [dash#2032](#2032).

    <details>

    <summary>Test Failure</summary>

    ```
    dash@1bebec413839:/src/dash$ ./test/functional/feature_minchainwork.py
    2024-08-01T17:29:41.116000Z TestFramework (INFO): Initializing test directory /tmp/dash_func_test_co8xkx43
    2024-08-01T17:29:42.145000Z TestFramework (INFO): Testing relay across node 1 (minChainWork = 101)
    2024-08-01T17:29:42.145000Z TestFramework (INFO): Generating 49 blocks on node0
    [...]
    2024-08-01T17:29:51.707000Z TestFramework (INFO): Generating one more block
    2024-08-01T17:29:51.709000Z TestFramework (INFO): Verifying nodes are all synced
    2024-08-01T17:30:51.989000Z TestFramework (ERROR): Assertion failed
    Traceback (most recent call last):
      File "/src/dash/test/functional/test_framework/test_framework.py", line 159, in main
        self.run_test()
      File "./test/functional/feature_minchainwork.py", line 101, in run_test
        self.sync_all()
      File "/src/dash/test/functional/test_framework/test_framework.py", line 811, in sync_all
        self.sync_blocks(nodes)
      File "/src/dash/test/functional/test_framework/test_framework.py", line 777, in sync_blocks
        raise AssertionError("Block sync timed out after {}s:{}".format(
    AssertionError: Block sync timed out after 60s:
      '00ab061e30aebd2f97153d26cd72f921af896f05c6469ad73c7de4fc283d9590'
      '00ab061e30aebd2f97153d26cd72f921af896f05c6469ad73c7de4fc283d9590'
      '000008ca1832a4baf228eb1553c03d3a2c8e02399550dd6ea8d65cec3ef23d2e'
    2024-08-01T17:30:52.490000Z TestFramework (INFO): Stopping nodes
    2024-08-01T17:30:53.495000Z TestFramework (WARNING): Not cleaning up dir /tmp/dash_func_test_co8xkx43
    2024-08-01T17:30:53.495000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/dash_func_test_co8xkx43/test_framework.log
    2024-08-01T17:30:53.495000Z TestFramework (ERROR):
    2024-08-01T17:30:53.495000Z TestFramework (ERROR): Hint: Call /src/dash/test/functional/combine_logs.py '/tmp/dash_func_test_co8xkx43' to consolidate all logs
    2024-08-01T17:30:53.495000Z TestFramework (ERROR):
    2024-08-01T17:30:53.495000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    2024-08-01T17:30:53.495000Z TestFramework (ERROR): https://github.com/dashpay/dash/issues
    2024-08-01T17:30:53.495000Z TestFramework (ERROR):
    ```

    </details>

  * [bitcoin#25720](bitcoin#25720) introduces a new test, `p2p_initial_headers_sync.py`, to validate that when a client has a stale tip, it will only engage in headers sync with one peer (emit a `getheaders2`\* message).

    Unmodified, this test fails (see below) as while the backport deals with one source of `getheaders2` messages, the test setup unwittingly triggers another ([source](https://github.com/dashpay/dash/blob/2379462294da884d925b5a80acec20fbc360309f/src/net_processing.cpp#L5446-L5448)), specifically, allowing the `pindexBestHeader->GetBlockTime() > GetAdjustedTime() - nMaxTipAge` condition to evaluate `true`.

    This is because, unlike in Bitcoin test suite's `setup_chain()` ([source](https://github.com/bitcoin/bitcoin/blob/22d96d76ab02fc73e7fe0d810bacee4c982df085/test/functional/test_framework/test_framework.py#L379-L385)), Dash sets the mocktime to match the mock chain ([source](https://github.com/dashpay/dash/blob/2379462294da884d925b5a80acec20fbc360309f/test/functional/test_framework/test_framework.py#L408-L416)) during setup, while the test assumes that the mock chain is stale enough to not trigger this source of `getheaders2` messages.

    As the tip is barely stale, it emits the `getheaders2` message, which is detected, causing the test to fail. This has been remedied by overriding `setup_chain()` to behave more like Bitcoin's test suite.

    _\* - `getheaders2` is a Dash-specific message that is courtesy of compressed headers, Bitcoin would be checking for `getheaders`_

    <details>

    <summary>Test Failure</summary>

    ```
    dash@6a2649cc721f:/src/dash$ ./test/functional/p2p_initial_headers_sync.py
    2024-06-17T21:24:09.921000Z TestFramework (INFO): Initializing test directory /tmp/dash_func_test_3gypo3ab
    2024-06-17T21:24:10.681000Z TestFramework (INFO): Adding a peer to node0
    2024-06-17T21:24:11.684000Z TestFramework (INFO): Connecting two more peers to node0
    2024-06-17T21:24:13.689000Z TestFramework (INFO): Verify that peer2 and peer3 don't receive a getheaders after connecting
    2024-06-17T21:24:15.193000Z TestFramework (ERROR): Assertion failed
    Traceback (most recent call last):
      File "/src/dash/test/functional/test_framework/test_framework.py", line 158, in main
        self.run_test()
      File "./test/functional/p2p_initial_headers_sync.py", line 60, in run_test
        assert "getheaders2" not in peer2.last_message
    AssertionError
    2024-06-17T21:24:15.695000Z TestFramework (INFO): Stopping nodes
    2024-06-17T21:24:16.698000Z TestFramework (WARNING): Not cleaning up dir /tmp/dash_func_test_3gypo3ab
    2024-06-17T21:24:16.698000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/dash_func_test_3gypo3ab/test_framework.log
    2024-06-17T21:24:16.699000Z TestFramework (ERROR):
    2024-06-17T21:24:16.699000Z TestFramework (ERROR): Hint: Call /src/dash/test/functional/combine_logs.py '/tmp/dash_func_test_3gypo3ab' to consolidate all logs
    2024-06-17T21:24:16.699000Z TestFramework (ERROR):
    2024-06-17T21:24:16.699000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    2024-06-17T21:24:16.699000Z TestFramework (ERROR): https://github.com/dashpay/dash/issues
    2024-06-17T21:24:16.699000Z TestFramework (ERROR):
    ```

    </details>

  ## Checklist:

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [x] I have added or updated relevant unit/integration/functional/e2e tests
  - [x] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

Top commit has no ACKs.

Tree-SHA512: 16b7c42195e197e8b6800c3425b4ff15a124b0e3e0da5c98ca6e22486b52c4ea82f2f05b83e28e838860b1ce76daa1e435c5eae4fb7591a161f26a5fff6189cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants