Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove delayed lookups #4992

Merged
merged 4 commits into from
Dec 8, 2023
Merged

Conversation

realbigsean
Copy link
Member

Issue Addressed

resolves #4884
resolves #4883
resolves #4787

Description

The benefits of making RPC requests for blobs/blocks based on seeing a single blob/block on gossip are dubious. They are very unlikely to make the difference between you attesting correctly and incorrectly. This is because we observe RPC requests taking about 1 second:

Sep 25 18:14:26.001 DEBG Sending BlobsByRoot Request
Sep 25 18:14:27.241 TRCE Received BlobsByRoot Response

So we need to make this query by around 2s to realistically have a shot at attesting to the block. It's very likely the missing blob just arrives on gossip. If it doesn't end up arriving on gossip, it's unlikely the block will be accepted by the network. And in a scenario where the block is accepted and we didn't see it on gossip. We will request from a peer who we see attest to it or build on it, as we do today and will be able to recover. Not to mention there are clients who attest as soon as the get the full block, so it's still possible that we trigger an RPC request for a block/blobs early and can complete it in time to attest. If a block or blobs failed to propagate to us on gossip, do we want to make an extreme effort to vote for it? Should we in fact be voting for a skipped slot?

A problem with trying to make an RPC request based just on gossip blob/block parts is that we have no idea which peers might have the block components we need. So we end up asking the peer who sent us a single block/blob for all other parts, thinking perhaps they have a good view of the network. But if this peer received the blob via gossip, why wouldn't they have forwarded it?

The downsides to making these delayed/optimistic lookup requests include:

  • More complicated peer scoring. We are currently incorrectly downscoring peers in slashable blob scenarios. This would correct that.
  • the delayed lookup logic frequently causes us to double download blobs because we get the blob over gossip while we are downloading via RPC

Additional Info

Note that an unknown parent from a block component on gossip will still immediately trigger a lookup.

Turns out this delayed lookup logic is actually kind of what we want for DAS. In DAS, however we will have some way of knowing which peers should have the block components we need. So we aren't guessing blindly. The logic here for triggering the lookups will be useful as well as the PeerShouldHave enum. Because we may not want to downscore peers we are sampling from.

@divagant-martian
Copy link
Collaborator

awesome to see code complexity reduced this much 🚀

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, really glad we could get rid of all this extra logic.
Gonna try and run this on devnet for a bit to see if there are any unexpected surprises.

@pawanjay176
Copy link
Member

This is looking good on the devnet, I'm happy to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants