eth/downloader: concurrent header downloads #2315

karalabe · 2016-03-09T14:39:21Z

Geth currently uses a single peer to pull all the headers of the blockchain from, and uses all of its peers to fill in the block bodies, receipts, state, etc that the header chain defined. This unfortunately introduces a serious bottleneck for newly joining peers, where if they select a shitty peer to pull the headers from, the entire sync takes ages.

The obvious solution is to download headers from multiple peers concurrently, but that also has its gotchas, specifically that consecutive headers downloaded from different peers might not match up to each other. This might happen due to malicious peers deliberately feeding junk or simply due to a large enough fork (e.g. homestead transition)... and actually this was specifically something the C++ codebase was bitten with causing hard crashes during the testnet homestead fork. The important lesson is that we need a mechanism to easily check if a batch of headers (seemingly) fits into our chain or not, without needing to actually import it.

Another important feature that we need to maintain is that even if we have pad peers, we should be able to sync in a reasonable time. If we keep piecing together malicious header batches and throwing them away, valid downloaded data will also be lost and inherently sync time would suffer. This currently is avoided by the single-peer header-download which ensure that if we "happen upon" a good peer, bad peers cannot affect us any more (and bad peers get thrown out quite quickly). The important takeaway is that if a single peer drives the header downloads, and it is good, bad peers cannot screw us any more, so it's essential to retain this capability even with concurrent headers.

Algorithm

The solution implemented by this PR is based on the concept of a header skeleton:

A single master-peer is used to retrieve a skeleton of the header chain: only every Nth header is retrieved, forming gaps of N-1 headers.
Concurrently all the peers are used to fill in the missing N-1 headers, each batch consisting of the N-1 unknown headers, and also the Nth known header as the last one.

This this algorithms the concurrently pulled headers are guaranteed to fit the skeleton:

The batch of headers must define a proper parent->child number and hash progression.
The first header of the batch must have the correct starting number of the header gap.
The last header of the batch must be the one already defined by the skeleton.

With the above algorithm, we can guarantee that only headers that map cleanly onto the master-peer's skeleton will be accepted, and we can do this with only lightweight hasing and don't need to process anything. Thus if the master-peer is good, sync completes fast and clean.

On the other hand if the master peer is bad and tries to feed an invalid skeleton chain, either no peers can send us the correct data to fill the gaps (in the case of which we assume an attacker and from the master-peer), or the master peer itself does feed us junk, which we can detect later during processing and drop the master-peer.

Caveats

The eth protocol doesn't support request/response IDs, so we can only issue one request of a single type (e.g. header request) to be able and still match up the replies with the requests. This means that we cannot isolate the master-peer to only retrieve skeleton headers and use others to fill the gaps, since we might only have one peer (e.g. private network setting). To work around this, the PR retrieves headers in batches:

Fetch a small skeleton (e.g. N headers, with N-1 gaps in between = ~N^2 sized skeleton)
Fill in the gaps will every peer, including the chosen master-peer that defined the skeleton
Deliver the entire filled skeleton (i.e. proper header chain) for processing
Repeat with the next batch if everything went well

Although the above batching solution works, it also introduces an annoying latency when one batch is finished and another is started, whereby during the filling of a skeleton batch, other parts of the downloader might starve (e.g. receipts aren't being pulled). This however can be solved quite easily by checking if a header delivery fills in a prefix of the skeleton and immediately stream the already completed parts to the rest of the downloader, without waiting for all the subsequent gaps to be done.

Performance

This this PR syncing the current testnet (~781K blocks) took 7 minutes and 50 seconds.

robotally · 2016-03-09T14:39:22Z

Vote	Count	Reviewers
👍	0
👎	0

Reaction	Users
:48:	@obscuren
:22:	@obscuren
:06:	@obscuren
:07:	@obscuren
:08:	@obscuren

Updated: Thu May 19 13:53:11 UTC 2016

codecov-io · 2016-03-09T15:38:25Z

Current coverage is 56.20%

Merging #2315 into develop will increase coverage by 0.33%

@@            develop      #2315   diff @@
==========================================
  Files           215        215          
  Lines         24238      24457   +219   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          13541      13745   +204   
- Misses        10694      10709    +15   
  Partials          3          3

Powered by Codecov. Last updated by e798e4f...982cd37

karalabe · 2016-04-19T12:24:44Z

Reviewers: @fjl @obscuren ;)

obscuren · 2016-05-10T10:34:08Z

eth/downloader/queue.go

+
+	// No skeleton retrieval can be in progress, fail hard if so (huge implementation bug)
+	if q.headerResults != nil {
+		panic("skeleton assembly already in progress")


Considered using glog.Fatal?

Question: would it matter if you didn't get the entire trace? Since 1.6 it only list the current goroutine.

This check is mostly for code quality reasons, so that if we screw up the code in the future it fails hard. I don't think that given the stack traces would be enough to figure out the reason that lead to this scenario. We can change it of course, just seems a plain panic is cleaner that a log embedded one.

I still think it's worth using glog instead of a naked panic. While we can grep or ack on the string it would be nice to see where the panic originated from in the message itself.

obscuren · 2016-05-11T09:27:02Z

What is causing the following:

I0511 11:06:03.496955 core/headerchain.go:294] imported 192 header(s) (0 ignored) in 56.701701ms. #1183424 [c6c6bb98… / 79cac71d…]
I0511 11:06:41.861371 eth/downloader/downloader.go:1541] Rolled back 2048 headers (LH: 1183424->1181376, FB: 1149170->1149170, LB: 0->0)
I0511 11:06:44.816239 eth/downloader/downloader.go:1012] Peer 096e0fa65b534195 [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set
I0511 11:07:04.751746 eth/downloader/downloader.go:1012] Peer 60f824e3e5c6ef51 [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set

I0511 11:07:34.750366 eth/downloader/downloader.go:1012] Peer 3f94cf46a096bd6a [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set
I0511 11:08:14.703504 eth/downloader/downloader.go:1012] Peer 48e673712367df09 [headers 0.00/s, blocks 0.00/s, receipts 0.00/s, states 0.00/s, lacking    0]: empty head header set
I0511 11:08:25.693052 core/headerchain.go:294] imported 0 header(s) (192 ignored) in 2.031727ms. #1149362 [a281858f… / 05444286…]

Failure:

I0511 10:48:34.649187 eth/downloader/downloader.go:301] Block synchronisation started
...
I0511 11:22:32.253577 core/headerchain.go:294] imported 2048 header(s) (0 ignored) in 4.44077446s. #1496439 [241febf7… / 616ba58f…]
I0511 11:22:32.364642 eth/downloader/downloader.go:1541] Rolled back 2048 headers (LH: 1496439->1494391, FB: 1491496->1491496, LB: 0->0)
I0511 11:22:37.683491 core/blockchain.go:959] imported 2 block(s) (0 queued 0 ignored) including 0 txs in 821.480261ms. #2 [88e96d45 / b495a1d7]

karalabe · 2016-05-19T06:35:41Z

My guess is that the last error was caused by the pile of transactions and the hug enetwork delays caused by it. With that being fixed, I think this PR is ready. @fjl @obscuren

obscuren · 2016-05-19T10:36:48Z

eth/downloader/peer.go

+	idle := func(p *peer) bool {
+		return atomic.LoadInt32(&p.headerIdle) == 0
+	}
+	throughput := func(p *peer) float64 {


Why isn't this a method on the peer itself? It seems a bit odd to create a lambda when it's mandatory that all peers should be able return their header throughput anyway. If a peer isn't considered a header idle peer perhaps it should return 0?

Can't say the reason for originally doing it like this. This specific version is to be in line with the rest of the method closures, but I don't see why not all of the closures could be separated out as fully qualified methods on the peer. We can do that in a follow up PR if need be, I'd rather not modify all the other code too in this PR.

obscuren · 2016-05-19T13:53:10Z

👍

* ci: temp enable blobtx branch ci run; * Switch ON blobpool & ensure Cancun hardfork can occur (ethereum#2223) * feat: support blob storage & miscs; (ethereum#2229) * chainconfig: use cancun fork for BSC; * feat: fill WithdrawalsHash when BSC enable cancun fork; * rawdb: support to CRUD blobs; * freezer: support to freeze block blobs; * blockchain: add blob cache & blob query helper; * freezer: refactor addition table logic, add uts; * blobexpiry: add more extra expiry time, and logs; * parlia: implement IsDataAvailable function; * blob: refactor blob transfer logic; * blob: support config blob extra reserve; * blockchian: support to import block with blob & blobGasFee; (ethereum#2260) * blob: implement min&max gas price logic; * blockchian: support import side chain; * blobpool: reject the banned address; * blockchain: add chasing head for DA check; * params: update blob related config; * blockchain: opt data available checking performance; * params: modify blob related params; * gasprice: support BEP-336 blob gas price calculate; * blobTx: mining + brodcasting (ethereum#2253) * blobtx mining pass (ethereum#2282) * Sidecar fetching changes for 4844 (ethereum#2283) * ci: temp enable blobtx branch ci run; * Switch ON blobpool & ensure Cancun hardfork can occur (ethereum#2223) * feat: support blob storage & miscs; (ethereum#2229) * chainconfig: use cancun fork for BSC; feat: fill WithdrawalsHash when BSC enable cancun fork; * rawdb: support to CRUD blobs; * freezer: support to freeze block blobs; * blockchain: add blob cache & blob query helper; * freezer: refactor addition table logic, add uts; * blobexpiry: add more extra expiry time, and logs; * parlia: implement IsDataAvailable function; * blob: refactor blob transfer logic; * blob: support config blob extra reserve; * blockchian: support to import block with blob & blobGasFee; (ethereum#2260) * blob: implement min&max gas price logic; * blockchian: support import side chain; * blobpool: reject the banned address; * blockchain: add chasing head for DA check; * params: update blob related config; * blockchain: opt data available checking performance; * params: modify blob related params; * gasprice: support BEP-336 blob gas price calculate; * fix failed check for WithdrawalsHash (ethereum#2276) * eth: include sidecars in fitering of body * core: refactor sidecars name * eth: sidecars type refactor * core: remove extra from bad merge * eth: fix handlenewblock test after merge * Implement eth_getBlobSidecars && eth_getBlobSidecarByTxHash (ethereum#2286) * execution: add blob gas fee reward to system; * syncing: support blob syncing & DA checking; * naming: rename blobs to sidecars; * fix the semantics of WithXXX (ethereum#2293) * config: reduce sidecar cache to 1024 and rename (ethereum#2297) * fix: Withdrawals turn into empty from nil when BlockBody has Sidecars (ethereum#2301) * internal/api_test: add test case for eth_getBlobSidecars && eth_getBlobSidecarByTxHash (ethereum#2300) * consensus/misc: rollback CalcBlobFee (ethereum#2306) * flags: add new flags to override blobs' params; * freezer: fix blob ancient save error; * blobsidecar: add new sidecar struct with metadata; (ethereum#2315) * core/rawdb: optimize write block with sidecars (ethereum#2318) * core: more check for validity of sidecars * mev: add TxIndex for mev bid (ethereum#2325) * remove useless Config() (ethereum#2326) * fix WithSidecars (ethereum#2327) * fix: fix mined block sidecar issue; (ethereum#2328) * fix WithSidecars (ethereum#2329) --------- Co-authored-by: GalaIO <[email protected]> Co-authored-by: buddho <[email protected]> Co-authored-by: Satyajit Das <[email protected]> Co-authored-by: Eric <[email protected]> Co-authored-by: zzzckck <[email protected]>

obscuren added the in progress label Mar 9, 2016

karalabe mentioned this pull request Mar 9, 2016

Concurrent headers #2263

Closed

karalabe added this to the 1.5.0 milestone Mar 9, 2016

karalabe force-pushed the concurrent-headers-2 branch from 74a736d to 88a7812 Compare March 9, 2016 15:30

karalabe added review and removed in progress labels Mar 9, 2016

karalabe force-pushed the concurrent-headers-2 branch 3 times, most recently from 1b9b9a4 to c79b32c Compare March 11, 2016 14:50

karalabe force-pushed the concurrent-headers-2 branch from c79b32c to 5eeb60b Compare April 19, 2016 07:29

karalabe changed the title ~~eth/downloader: implement concurrent header downloads~~ eth/downloader: concurrent header downloads Apr 19, 2016

obscuren reviewed May 10, 2016
View reviewed changes

fjl mentioned this pull request May 14, 2016

eth: make fast sync great again #2569

Closed

7 tasks

karalabe force-pushed the concurrent-headers-2 branch from b46d2a1 to b0463c8 Compare May 17, 2016 06:59

karalabe added 2 commits May 17, 2016 10:03

eth/downloader: implement concurrent header downloads

b40dc8a

eth/downloader: stream partial skeleton filling to processor

e86619e

karalabe force-pushed the concurrent-headers-2 branch from b0463c8 to e86619e Compare May 17, 2016 07:03

obscuren reviewed May 19, 2016
View reviewed changes

karalabe added the merge imminent label May 19, 2016

eth/downloader: fix reviewer comments

8906b2f

karalabe force-pushed the concurrent-headers-2 branch from 982cd37 to 8906b2f Compare May 20, 2016 08:01

obscuren merged commit a8472e0 into ethereum:develop May 20, 2016

obscuren removed the review label May 20, 2016

vpulim mentioned this pull request Dec 4, 2020

Client: download header skeleton ethereumjs/ethereumjs-monorepo#985

Closed

3 tasks

enriquerodbe mentioned this pull request Jan 18, 2021

[ETCM-313] and [ETCM-316]: Header skeleton using new branch resolver input-output-hk/mantis#892

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eth/downloader: concurrent header downloads #2315

eth/downloader: concurrent header downloads #2315

karalabe commented Mar 9, 2016 •

edited

Loading

robotally commented Mar 9, 2016 •

edited

Loading

codecov-io commented Mar 9, 2016 •

edited

Loading

karalabe commented Apr 19, 2016

obscuren May 10, 2016

karalabe May 17, 2016

obscuren May 19, 2016

obscuren commented May 11, 2016

karalabe commented May 19, 2016

obscuren May 19, 2016

karalabe May 19, 2016

obscuren commented May 19, 2016

eth/downloader: concurrent header downloads #2315

eth/downloader: concurrent header downloads #2315

Conversation

karalabe commented Mar 9, 2016 • edited Loading

Algorithm

Caveats

Performance

robotally commented Mar 9, 2016 • edited Loading

codecov-io commented Mar 9, 2016 • edited Loading

Current coverage is 56.20%

karalabe commented Apr 19, 2016

obscuren May 10, 2016

Choose a reason for hiding this comment

karalabe May 17, 2016

Choose a reason for hiding this comment

obscuren May 19, 2016

Choose a reason for hiding this comment

obscuren commented May 11, 2016

karalabe commented May 19, 2016

obscuren May 19, 2016

Choose a reason for hiding this comment

karalabe May 19, 2016

Choose a reason for hiding this comment

obscuren commented May 19, 2016

karalabe commented Mar 9, 2016 •

edited

Loading

robotally commented Mar 9, 2016 •

edited

Loading

codecov-io commented Mar 9, 2016 •

edited

Loading