Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlocksByRange under WS #2131

Merged
merged 7 commits into from
May 12, 2021
Merged

BlocksByRange under WS #2131

merged 7 commits into from
May 12, 2021

Conversation

djrtwo
Copy link
Contributor

@djrtwo djrtwo commented Nov 12, 2020

Partially addresses #2116

  • Defines MIN_EPOCHS_FOR_BLOCK_REQUESTS for minimum expectation of epoch range for serving blocks (and thus how far a new node must backfill)
  • Does not define a way to advertise this in Req/Resp (e.g. in Status) or in ENR
    • I personally think it unwise to modify Status or MetaData (likely with a protocol ID update) this near to genesis launch. MIN_EPOCHS_FOR_BLOCK_REQUESTS gives us 4+ months to upgrade the req/resp to make this info available

The main thing the above solution won't capture during the first 4 months is the event that a node is back-filling blocks from a checkpoint state to genesis. In such a case, the peer might not be able to respond to BlocksByRange requests successfully. To handle this, I suggest we spec an error code for this case. This will carry forward in the future when BlocksByRange requests are made outside of the advertised range. We can either use 2 (ServerError) or define 3 (ResourceUnavailable)

Questions:

  1. Are we okay leaving out advertisement for earliest epoch for block requests until a couple of months after mainnet?
  2. Good with adding the error? If so, use 2 or define 3?

@djrtwo
Copy link
Contributor Author

djrtwo commented Nov 12, 2020

@mbaxter
Copy link
Contributor

mbaxter commented Nov 12, 2020

  1. I think it's fine to defer advertising the earliest available epoch - it does seem like the wrong time to be pushing this change in :).
  2. The error makes sense to me, I vote for 3 (ResourceUnavailable).

@AgeManning
Copy link
Contributor

  1. I'm fine with this
  2. Agree with 3 (ResourceUnavailable)

@arnetheduck
Copy link
Contributor

  1. ok
  2. The wording of the text suggest that clients must offer up blocks - adding the possibility to claim ResourceUnavailable allows clients to keep claiming that indefinitely - the correct behaviour would be to simply sync from MIN_EPOCHS_FOR_BLOCK_REQUESTS an onwards instead of "backfilling" - this allows the client to report a "correct" head in the existing Status request without the loophole at the cost of slightly slowing down startup. We can deliberate the addition of an error code when extending Status/Metadata in the 4-5 months mentioned - until then, the language in this PR doesn't really allow for a conforming and honest client to return it ("must backfill" should however be replaced by "must sync from"). One issue would be that this slows down startup slightly because the client must now sync more blocks - but on the other hand, this benefits the network because we enshrine a supply of synced nodes more strongly - faster syncing lies in the domain of light clients, and can/should be solved then, along with archive solutions.

@nisdas
Copy link
Contributor

nisdas commented Nov 15, 2020

  1. This sounds good, gives us a bit of time to sort it out.
  2. 3 (ResourceUnavailable) is an appropriate error code for this.

@ajsutton
Copy link
Contributor

the correct behaviour would be to simply sync from MIN_EPOCHS_FOR_BLOCK_REQUESTS an onwards instead of "backfilling"

This isn't possible. The sync has to start from the state provided by the user and go forward from there. Clients that respond with ResourceUnavailable would very likely be immediately disconnected as useless clients so there's no advantage to claiming that indefinitely.

@AgeManning
Copy link
Contributor

Yeah. For the record. Our current strategy is going to be to ban any node that sends us this error (at least for the first 5 months).

The ban lasts ~40 mins which should be enough time for the peer to backfil etc.

@djrtwo
Copy link
Contributor Author

djrtwo commented Nov 16, 2020

This isn't possible. The sync has to start from the state provided by the user and go forward from there.

Yes, this is correct. We can't enforce that users must show up with a state that is exactly MIN_EPOCHS_FOR_BLOCK_REQUESTS in the past. And if we did, we would be forcing them to choose a WS state that is at the bounds of being safe (and for small validator sets, just simply unsafe).

Note, that for a backfill of blocks only to that boundary, the only validity checks (without getting an old state) that a node can do is check that these blocks for a hash chain. Because the WS state is trusted, this check is enough to show validity of the blocks for storage and future serving.

Note that if you backfill all the way to genesis, you could then check other validity conditions and could produce historic states.

@arnetheduck
Copy link
Contributor

arnetheduck commented Nov 16, 2020

This isn't possible.

We can't enforce that users must show up with a state that is exactly

Why are the two related? When we ask for BlocksByRange, we can simply start at slot wallslot - MIN_EPOCHS_FOR_BLOCK_REQUESTS and should along the way hit the WS state - ie it's still a "backfill" in that sense, except that it's done in order and therefore it is more likely that the network keeps the relevant blocks around, per the 'MUST' requirement in this PR.

Because the WS state is trusted, this check is enough to show validity of the blocks for storage and future serving.

It should also do a signature check, else it can be poisoned with blocks that have the correct root but invalid signature - it should be possible since we have the proposer index from the block and the validator set of the WS state.

The point here is a bit that if we want to ensure that blocks are around, it's more likely that this will happen if the easiest thing to do is to get&store the blocks - starting range sync from WS state then "maybe" backfilling makes it less likely that the blocks will stay around - to make the sync requirement even stronger, it would make some sense to bake in "probe" range requests even if synced and disconnect any client that does not serve the blocks, or already-synced clients have little to lose (being disconnected by an unsynced client is a small loss - being disconnected by a synced client means a clear and present risk that your attestations will not make it through)

@djrtwo
Copy link
Contributor Author

djrtwo commented Dec 9, 2020

Why are the two related?

Because you have no way to validate a block from wallslot - MIN_EPOCHS_FOR_BLOCK_REQUESTS. You have no start block to check that it forms a chain, and you have no state to check state transition validity.

You can check that the blocks being served to you form a chain and you can even validate the proposer signature against the validator set you have from your WS state, but you cannot validate that the sequence of blocks being sent to you is valid wrt state transition nor can you validate that the blocks will ultimately reach the chain you decided was valid through you input of WS state.

Thus an attacker can serve you many blocks before you can know if they are valuable/valid, and thus can cause you to consume extra bandwidth.

It should also do a signature check, else it can be poisoned with blocks that have the correct root but invalid signature - it should be possible since we have the proposer index from the block and the validator set of the WS state.

Yes, agreed. Because you are storing the SignedBeaconBlock you'll want to validate that outer container.

bors bot pushed a commit to sigp/lighthouse that referenced this pull request Dec 13, 2020
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/consensus-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
bors bot pushed a commit to sigp/lighthouse that referenced this pull request Dec 14, 2020
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/consensus-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
bors bot pushed a commit to sigp/lighthouse that referenced this pull request Dec 15, 2020
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/consensus-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
@djrtwo djrtwo requested a review from ralexstokes May 11, 2021 17:30
Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

There was at one point an error code that could be returned for any blocks by range request the node couldn't fulfil because they don't yet have the blocks. I still think that would be very useful as it makes it explicit that the node doesn't have the blocks rather than returning the empty response which could mean they don't have the blocks or that there were no blocks.

specs/phase0/p2p-interface.md Outdated Show resolved Hide resolved
Copy link
Member

@ralexstokes ralexstokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good! let some minor notes/comments

README.md Outdated Show resolved Hide resolved
specs/phase0/p2p-interface.md Outdated Show resolved Hide resolved
specs/phase0/p2p-interface.md Outdated Show resolved Hide resolved
specs/phase0/weak-subjectivity.md Show resolved Hide resolved
@arnetheduck
Copy link
Contributor

Because you have no way to validate a block from wallslot - MIN_EPOCHS_FOR_BLOCK_REQUESTS. You have no start block to check that it forms a chain, and you have no state to check state transition validity.

This is something we can fix I think: turning historical_roots into historical_block_roots (without mixing in the state), we could validate groups of 8192 blocks at a time which would provide a good upper bound on the damage one can cause.

@djrtwo
Copy link
Contributor Author

djrtwo commented May 12, 2021

Added the 3: ResourceUnavailable response code. I know we are generally pretty conservative on adding such codes, but 3 of 4 client teams are in vocal support

@djrtwo djrtwo merged commit 84830e8 into dev May 12, 2021
@djrtwo djrtwo deleted the bbr-ws branch May 12, 2021 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants