Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-org feature does not work with Vouch blinded proposals #4802

Open
michaelsproul opened this issue Oct 3, 2023 · 6 comments
Open

Re-org feature does not work with Vouch blinded proposals #4802

michaelsproul opened this issue Oct 3, 2023 · 6 comments
Labels
bug Something isn't working builder API HTTP-API val-client Relates to the validator client binary

Comments

@michaelsproul
Copy link
Member

Description

Vouch is currently not compatible with Lighthouse's late block re-org feature when proposing a blinded block. The only workaround is to either disable blinded block proposals, or disable the re-org feature with --disable-proposer-reorgs.

Detail

We've seen this issue occur in the wild, where it results in the re-orging block being rejected by Vouch, and a block with 0 transactions being signed instead:

  • Lighthouse suppressed the payload-attributes message to Geth because it wanted to re-org.
  • Vouch requested a blinded block.
  • Lighthouse requested a header from Vouch building on the parent of the current head (an attempted re-org). Vouch didn't return a header.
  • Lighthouse built a (re-orging) blinded block using a local payload.
  • Vouch rejected the blinded payload due to the execution payload not being as expected.
  • Vouch requested a full block and Lighthouse abandoned the re-org strategy because it was after the 1s cutoff. Lighthouse sent a (late) payload-build message to Geth, followed by a near-immediate request for that payload.
  • Geth didn't have any time to build a payload so it built one with 0 transactions.

Version

v4.5.0

Steps to resolve

This issue will disappear once we support the v3 block endpoint, and Vouch makes use of it: #4629

The reason being that (I think) Vouch will no longer expect Lighthouse to use a specific payload that it has selected, so Lighthouse will be free to fetch one from mev-boost directly, or can build a local payload and return that immediately to Vouch (which can be accepted or rejected depending on its policy and other available blocks).

@michaelsproul michaelsproul added the bug Something isn't working label Oct 3, 2023
@adaszko
Copy link
Contributor

adaszko commented Dec 19, 2023

I'm not entirely sure if it's related but I saw another logical path leading to the same symptoms (i.e. proposing an empty block) with Vouch.

The problem is that Lighthouse seems to pass an incorrect parentHash to Vouch in a request to build a block that's to be proposed. There's a re-org happening right before this (it's caused by extraneous circumstances) that I think might be the triggering it. Here are the logs:

Dec 19 09:38:22 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:22.601 WARN Beacon chain re-org                     reorg_distance: 1, new_slot: 8012889, new_head: 0x0b0a274dc8f5805b80611c08b66884740c18b4ee43deb133a7faad8a85758a17, previous_slot: 8012888, previous_head: 0x3c87d6993bdcc2553eba8e2ee5b274ee3b0a6098c0ebbd70df89e7a181330b5f, service: beacon
Dec 19 09:38:23 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:23.438 INFO Prepared beacon proposer                parent_root: 0x0b0a274dc8f5805b80611c08b66884740c18b4ee43deb133a7faad8a85758a17, validator: 953658, prepare_slot: 8012890, service: beacon
Dec 19 09:38:23 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:23.438 WARN Delayed proposer preparation            validator: 953658, prepare_slot: 8012890, service: beacon
Dec 19 09:38:23 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:23.982 INFO Requesting blinded header from connected builder, parent_hash: 0x7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e, pubkey: 0xad779ae44bd3a468c970fecb2f759518497bee30abcbfbd11cad74194cd540c11c503489a4725a2b00a9d8ae8a1fdbaf, slot: Slot(8012890), service: exec
Dec 19 09:38:23 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:23.997 INFO Requested blinded execution payload     parent_hash: 0x7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e, local_response_ms: 9, local_fee_recipient: 0xffee087852cb4898e6c3532e776e68bc68b1143b, relay_response_ms: 14, relay_fee_recipient: request failed, service: exec
Dec 19 09:38:23 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:23.998 WARN Builder error when requesting payload   parent_hash: 0x7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e, local_block_hash: 0x6ea42fdca3fb3f5bca1ae8c987bd009541a886b3e59d5d06484090213820e4da, relay_error: ServerMessage(ErrorMessage { code: 500, message: "Failed to obtain bid", stacktraces: [] }), info: falling back to local execution client, service: exec
Dec 19 09:38:25 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:25.087 INFO New block received                      root: 0x8133ca1dce784b2e4fb5823729dae23fa2b0df95d626b451293225a57ec5d4c8, slot: 8012890
Dec 19 09:38:25 lighthouse-beacon-node-1 lighthouse[74220]: Dec 19 09:38:25.489 WARN Error processing HTTP API request       method: POST, path: /eth/v1/beacon/blocks, status: 202 Accepted, elapsed: 29.632227ms

On the Vouch side:

Dec 19 09:38:23 vouch-lighthouse-1 vouch[3087238]: {"level":"trace","service":"daemon","impl":"rest","time":"2023-12-19T09:38:23Z","message":"getBuilderBid called"}
Dec 19 09:38:23 vouch-lighthouse-1 vouch[3087238]: {"level":"trace","service":"blockrelay","impl":"standard","slot":8012890,"parent_hash":"0x7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e","pubkey":"0xad779ae44bd3a468c970fecb2f759518497bee30abcbfbd11cad74194cd540c11c503489a4725a2b00a9d8ae8a1fdbaf","time":"2023-12-19T09:38:23Z","message":"Builder bid called"}
Dec 19 09:38:23 vouch-lighthouse-1 vouch[3087238]: {"level":"debug","service":"blockrelay","impl":"standard","key":"8012890","subkey":"7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e:ad779ae44bd3a468c970fecb2f759518497bee30abcbfbd11cad74194cd540c11c503489a4725a2b00a9d8ae8a1fdbaf","time":"2023-12-19T09:38:23Z","message":"Builder bid not found (subkey)"}
Dec 19 09:38:23 vouch-lighthouse-1 vouch[3087238]: {"level":"error","service":"daemon","impl":"rest","error":"builder bid not known (subkey)","time":"2023-12-19T09:38:23Z","message":"Failed to obtain bid"}

Vouch's subkey is just sprintf("%s:%s", parentHash, pubkey), so Lighthouse passed 7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e as parentHash for slot 8012890 whereas the actual parentHash is 0x0b0a274dc8f5805b80611c08b66884740c18b4ee43deb133a7faad8a85758a17 and it is actually mentioned in the Lighthouse re-org log entry!

The question is: Is what we're seeing here a race between Lighthouse's different subsystems' view of the current head?

PS. This happens with --disable-proposer-reorgs and --always-prefer-builder-payloads.

@michaelsproul
Copy link
Member Author

Interesting find @adaszko!

What's happening here is a race between what Vouch thinks the head should be, and what it actually is according to Lighthouse. Lighthouse has more up to date info, and Vouch's info is stale.

Vouch's subkey is just sprintf("%s:%s", parentHash, pubkey), so Lighthouse passed 7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e as parentHash for slot 8012890 whereas the actual parentHash is 0x0b0a274dc8f5805b80611c08b66884740c18b4ee43deb133a7faad8a85758a17 and it is actually mentioned in the Lighthouse re-org log entry!

Lighthouse is trying to build a block on top of the block at slot 8012889 which has:

  • Block root (consensus): 0x0b0a274dc8f5805b80611c08b66884740c18b4ee43deb133a7faad8a85758a17
  • Block hash (execution): 0x7d083cb2caaad08d763ef6a5f585946a57f6df2cf9edf09b8291a0d27424737e

You mixed up the CL block root & the execution block hash, which is why you thought the values Lighthouse sent were inconsistent.

Lighthouse wants to build on 8012889, because it's the true head according to fork choice. Vouch refuses to play along with this choice of head, likely due to it not knowing that this is the head because of the late-in-the-slot reorg. I don't know how Vouch updates its view of the head, maybe @mcdee could shed some light on this. I suspect this race is hard to avoid perfectly, because if there's a reorg late enough in the slot, then the BN can always choose a different head to what Vouch expects. Jim and I have talked before about Vouch supporting building blinded blocks on multiple heads (for the re-org feature), and that would help here.

The other thing that helps is block v3. At the moment Vouch rejects any blinded block that doesn't match its choice of builder payload. With v3 the BN will no longer return bogus blinded blocks and will just return a full block, which Vouch should accept regardless of the head it builds on (else it is vulnerable to racing as above, and incompatibile with the reorg feature). We're putting the finishing touches on block v3 and ironing out kinks in the spec, so it should be ready for testing (on testnets, and then maybe mainnet) after our next release. Tracking PR: #5015.

The other thing that's a bit suspicious is that your local Geth built such a bad payload. It should have had plenty of time to pack some transactions between when it was told about the reorg at 09:38:22.601 and when the payload was requested at 09:38:23.982 (1.3s later). Is your Geth completely starved of peers or running with its mempool turned off or something?

Anyway, nice find! This bug has probably existed for ages without anyone reporting it. I think the best workaround for now would be to try to get the EL to build better blocks on short notice. It won't be ideal going through the pipeline of Vouch rejecting the blinded block and then requesting a full block, but it should still work. New releases in a few months should solve the issue once and for all.

@michaelsproul michaelsproul added val-client Relates to the validator client binary HTTP-API builder API labels Dec 20, 2023
@adaszko
Copy link
Contributor

adaszko commented Dec 20, 2023

Thank you for checking this, @michaelsproul! I totally missed that Lighthouse logs EL hashes rather then CL ones. I will check get's mempool setting and logs. Eagerly awaiting new Lighthouse releases too!

@adaszko
Copy link
Contributor

adaszko commented Feb 27, 2024

New releases in a few months should solve the issue once and for all.

Hi 👋 Has there been any movement with this issue?

@mcdee
Copy link
Contributor

mcdee commented Feb 27, 2024

Yes, the various client teams have been tidying up their implementations of the V3 blocks endpoint, and once they are all confirmed compliant with the spec I'll look to release a new version of Vouch that uses this endpoint. It won't be until after dencun, though, because we want to ensure a stable release with the existing endpoints first.

@adaszko
Copy link
Contributor

adaszko commented Feb 27, 2024

Great, thank you for the update @mcdee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working builder API HTTP-API val-client Relates to the validator client binary
Projects
None yet
Development

No branches or pull requests

3 participants