Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/downloader: concurrent header downloads #2315

Merged
merged 3 commits into from
May 20, 2016

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Mar 9, 2016

Geth currently uses a single peer to pull all the headers of the blockchain from, and uses all of its peers to fill in the block bodies, receipts, state, etc that the header chain defined. This unfortunately introduces a serious bottleneck for newly joining peers, where if they select a shitty peer to pull the headers from, the entire sync takes ages.

The obvious solution is to download headers from multiple peers concurrently, but that also has its gotchas, specifically that consecutive headers downloaded from different peers might not match up to each other. This might happen due to malicious peers deliberately feeding junk or simply due to a large enough fork (e.g. homestead transition)... and actually this was specifically something the C++ codebase was bitten with causing hard crashes during the testnet homestead fork. The important lesson is that we need a mechanism to easily check if a batch of headers (seemingly) fits into our chain or not, without needing to actually import it.

Another important feature that we need to maintain is that even if we have pad peers, we should be able to sync in a reasonable time. If we keep piecing together malicious header batches and throwing them away, valid downloaded data will also be lost and inherently sync time would suffer. This currently is avoided by the single-peer header-download which ensure that if we "happen upon" a good peer, bad peers cannot affect us any more (and bad peers get thrown out quite quickly). The important takeaway is that if a single peer drives the header downloads, and it is good, bad peers cannot screw us any more, so it's essential to retain this capability even with concurrent headers.

Algorithm

The solution implemented by this PR is based on the concept of a header skeleton:

  • A single master-peer is used to retrieve a skeleton of the header chain: only every Nth header is retrieved, forming gaps of N-1 headers.
  • Concurrently all the peers are used to fill in the missing N-1 headers, each batch consisting of the N-1 unknown headers, and also the Nth known header as the last one.

This this algorithms the concurrently pulled headers are guaranteed to fit the skeleton:

  • The batch of headers must define a proper parent->child number and hash progression.
  • The first header of the batch must have the correct starting number of the header gap.
  • The last header of the batch must be the one already defined by the skeleton.

With the above algorithm, we can guarantee that only headers that map cleanly onto the master-peer's skeleton will be accepted, and we can do this with only lightweight hasing and don't need to process anything. Thus if the master-peer is good, sync completes fast and clean.

On the other hand if the master peer is bad and tries to feed an invalid skeleton chain, either no peers can send us the correct data to fill the gaps (in the case of which we assume an attacker and from the master-peer), or the master peer itself does feed us junk, which we can detect later during processing and drop the master-peer.

Caveats

The eth protocol doesn't support request/response IDs, so we can only issue one request of a single type (e.g. header request) to be able and still match up the replies with the requests. This means that we cannot isolate the master-peer to only retrieve skeleton headers and use others to fill the gaps, since we might only have one peer (e.g. private network setting). To work around this, the PR retrieves headers in batches:

  • Fetch a small skeleton (e.g. N headers, with N-1 gaps in between = ~N^2 sized skeleton)
  • Fill in the gaps will every peer, including the chosen master-peer that defined the skeleton
  • Deliver the entire filled skeleton (i.e. proper header chain) for processing
  • Repeat with the next batch if everything went well

Although the above batching solution works, it also introduces an annoying latency when one batch is finished and another is started, whereby during the filling of a skeleton batch, other parts of the downloader might starve (e.g. receipts aren't being pulled). This however can be solved quite easily by checking if a header delivery fills in a prefix of the skeleton and immediately stream the already completed parts to the rest of the downloader, without waiting for all the subsequent gaps to be done.

Performance

This this PR syncing the current testnet (~781K blocks) took 7 minutes and 50 seconds.

@robotally
Copy link

robotally commented Mar 9, 2016

Vote Count Reviewers
👍 0
👎 0
Reaction Users
:48: @obscuren
:22: @obscuren
:06: @obscuren
:07: @obscuren
:08: @obscuren

Updated: Thu May 19 13:53:11 UTC 2016

@karalabe karalabe mentioned this pull request Mar 9, 2016
@karalabe karalabe added this to the 1.5.0 milestone Mar 9, 2016
@karalabe karalabe force-pushed the concurrent-headers-2 branch from 74a736d to 88a7812 Compare March 9, 2016 15:30
@codecov-io
Copy link

codecov-io commented Mar 9, 2016

Current coverage is 56.20%

Merging #2315 into develop will increase coverage by 0.33%

@@            develop      #2315   diff @@
==========================================
  Files           215        215          
  Lines         24238      24457   +219   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          13541      13745   +204   
- Misses        10694      10709    +15   
  Partials          3          3          

Powered by Codecov. Last updated by e798e4f...982cd37

@karalabe karalabe force-pushed the concurrent-headers-2 branch 3 times, most recently from 1b9b9a4 to c79b32c Compare March 11, 2016 14:50
@karalabe karalabe force-pushed the concurrent-headers-2 branch from c79b32c to 5eeb60b Compare April 19, 2016 07:29
@karalabe
Copy link
Member Author

Reviewers: @fjl @obscuren ;)

@karalabe karalabe changed the title eth/downloader: implement concurrent header downloads eth/downloader: concurrent header downloads Apr 19, 2016

// No skeleton retrieval can be in progress, fail hard if so (huge implementation bug)
if q.headerResults != nil {
panic("skeleton assembly already in progress")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered using glog.Fatal?

Question: would it matter if you didn't get the entire trace? Since 1.6 it only list the current goroutine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is mostly for code quality reasons, so that if we screw up the code in the future it fails hard. I don't think that given the stack traces would be enough to figure out the reason that lead to this scenario. We can change it of course, just seems a plain panic is cleaner that a log embedded one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's worth using glog instead of a naked panic. While we can grep or ack on the string it would be nice to see where the panic originated from in the message itself.

@obscuren
Copy link
Contributor

What is causing the following:

I0511 11:06:03.496955 core/headerchain.go:294] imported 192 header(s) (0 ignored) in 56.701701ms. #1183424 [c6c6bb98… / 79cac71d…]
I0511 11:06:41.861371 eth/downloader/downloader.go:1541] Rolled back 2048 headers (LH: 1183424->1181376, FB: 1149170->1149170, LB: 0->0)
I0511 11:06:44.816239 eth/downloader/downloader.go:1012] Peer 096e0fa65b534195 [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set
I0511 11:07:04.751746 eth/downloader/downloader.go:1012] Peer 60f824e3e5c6ef51 [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set

I0511 11:07:34.750366 eth/downloader/downloader.go:1012] Peer 3f94cf46a096bd6a [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set
I0511 11:08:14.703504 eth/downloader/downloader.go:1012] Peer 48e673712367df09 [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set
I0511 11:08:25.693052 core/headerchain.go:294] imported 0 header(s) (192 ignored) in 2.031727ms. #1149362 [a281858f… / 05444286…]

Failure:

I0511 10:48:34.649187 eth/downloader/downloader.go:301] Block synchronisation started
...
I0511 11:22:32.253577 core/headerchain.go:294] imported 2048 header(s) (0 ignored) in 4.44077446s. #1496439 [241febf7… / 616ba58f…]
I0511 11:22:32.364642 eth/downloader/downloader.go:1541] Rolled back 2048 headers (LH: 1496439->1494391, FB: 1491496->1491496, LB: 0->0)
I0511 11:22:37.683491 core/blockchain.go:959] imported 2 block(s) (0 queued 0 ignored) including 0 txs in 821.480261ms. #2 [88e96d45 / b495a1d7]

@fjl fjl mentioned this pull request May 14, 2016
7 tasks
@karalabe karalabe force-pushed the concurrent-headers-2 branch from b46d2a1 to b0463c8 Compare May 17, 2016 06:59
@karalabe karalabe force-pushed the concurrent-headers-2 branch from b0463c8 to e86619e Compare May 17, 2016 07:03
@karalabe
Copy link
Member Author

My guess is that the last error was caused by the pile of transactions and the hug enetwork delays caused by it. With that being fixed, I think this PR is ready. @fjl @obscuren

idle := func(p *peer) bool {
return atomic.LoadInt32(&p.headerIdle) == 0
}
throughput := func(p *peer) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this a method on the peer itself? It seems a bit odd to create a lambda when it's mandatory that all peers should be able return their header throughput anyway. If a peer isn't considered a header idle peer perhaps it should return 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say the reason for originally doing it like this. This specific version is to be in line with the rest of the method closures, but I don't see why not all of the closures could be separated out as fully qualified methods on the peer. We can do that in a follow up PR if need be, I'd rather not modify all the other code too in this PR.

@obscuren
Copy link
Contributor

👍

@karalabe karalabe force-pushed the concurrent-headers-2 branch from 982cd37 to 8906b2f Compare May 20, 2016 08:01
@obscuren obscuren merged commit a8472e0 into ethereum:develop May 20, 2016
@obscuren obscuren removed the review label May 20, 2016
maoueh pushed a commit to streamingfast/go-ethereum that referenced this pull request Apr 17, 2024
* ci: temp enable blobtx branch ci run;
* Switch ON blobpool & ensure Cancun hardfork can occur (ethereum#2223)
* feat: support blob storage & miscs; (ethereum#2229)
* chainconfig: use cancun fork for BSC;
* feat: fill WithdrawalsHash when BSC enable cancun fork;
* rawdb: support to CRUD blobs;
* freezer: support to freeze block blobs;
* blockchain: add blob cache & blob query helper;
* freezer: refactor addition table logic, add uts;
* blobexpiry: add more extra expiry time, and logs;
* parlia: implement IsDataAvailable function;
* blob: refactor blob transfer logic;
* blob: support config blob extra reserve;
* blockchian: support to import block with blob & blobGasFee; (ethereum#2260)
* blob: implement min&max gas price logic;
* blockchian: support import side chain;
* blobpool: reject the banned address;
* blockchain: add chasing head for DA check;
* params: update blob related config;
* blockchain: opt data available checking performance;
* params: modify blob related params;
* gasprice: support BEP-336 blob gas price calculate;
* blobTx: mining + brodcasting (ethereum#2253)
* blobtx mining pass (ethereum#2282)
* Sidecar fetching changes for 4844 (ethereum#2283)
* ci: temp enable blobtx branch ci run;
* Switch ON blobpool & ensure Cancun hardfork can occur (ethereum#2223)
* feat: support blob storage & miscs; (ethereum#2229)
* chainconfig: use cancun fork for BSC;
feat: fill WithdrawalsHash when BSC enable cancun fork;
* rawdb: support to CRUD blobs;
* freezer: support to freeze block blobs;
* blockchain: add blob cache & blob query helper;
* freezer: refactor addition table logic, add uts;
* blobexpiry: add more extra expiry time, and logs;
* parlia: implement IsDataAvailable function;
* blob: refactor blob transfer logic;
* blob: support config blob extra reserve;
* blockchian: support to import block with blob & blobGasFee; (ethereum#2260)
* blob: implement min&max gas price logic;
* blockchian: support import side chain;
* blobpool: reject the banned address;
* blockchain: add chasing head for DA check;
* params: update blob related config;
* blockchain: opt data available checking performance;
* params: modify blob related params;
* gasprice: support BEP-336 blob gas price calculate;
* fix failed check for WithdrawalsHash (ethereum#2276)
* eth: include sidecars in fitering of body
* core: refactor sidecars name
* eth: sidecars type refactor
* core: remove extra from bad merge
* eth: fix handlenewblock test after merge
* Implement eth_getBlobSidecars && eth_getBlobSidecarByTxHash (ethereum#2286)
* execution: add blob gas fee reward to system;
* syncing: support blob syncing & DA checking;
* naming: rename blobs to sidecars;
* fix the semantics of WithXXX (ethereum#2293)
* config: reduce sidecar cache to 1024 and rename (ethereum#2297)
* fix: Withdrawals turn into empty from nil when BlockBody has Sidecars (ethereum#2301)
* internal/api_test: add test case for eth_getBlobSidecars && eth_getBlobSidecarByTxHash (ethereum#2300)
* consensus/misc: rollback CalcBlobFee (ethereum#2306)
* flags: add new flags to override blobs' params;
* freezer: fix blob ancient save error;
* blobsidecar: add new sidecar struct with metadata; (ethereum#2315)
* core/rawdb: optimize write block with sidecars (ethereum#2318)
* core: more check for validity of sidecars
* mev: add TxIndex for mev bid (ethereum#2325)
* remove useless Config() (ethereum#2326)
* fix WithSidecars (ethereum#2327)
* fix: fix mined block sidecar issue; (ethereum#2328)
* fix WithSidecars (ethereum#2329)

---------
Co-authored-by: GalaIO <[email protected]>
Co-authored-by: buddho <[email protected]>
Co-authored-by: Satyajit Das <[email protected]>
Co-authored-by: Eric <[email protected]>
Co-authored-by: zzzckck <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants