-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
miner: Add block building interruption on payload resolution (getPayload) #186
Conversation
f07b924
to
c4a92d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notes from call with proto
01831f8
to
8381705
Compare
8381705
to
552cd8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When talking about returning the "full" block, are we literally waiting for it to be packed with as many tx as possible, or does the interrupt stop adding tx to the block?
The ideal behaviour when getPayload
is called is basically:
- If a valid block has already been built and is ready, return that and discard the in-progress block.
- Otherwise, stop adding tx from the pool immediately and finalise the block in progress so it can be returned.
Potentially you could be even smarter and compare the value of the existing block to the value of the block in progress and finalized the block in progress if it's already worth more.
That may mean that we return a half-full block because that's all we had time to build but that's a lot better than being late delivering the block. I wonder if that potentially simplifies some of the logic here as well since FCU would always defer building to the worker thread and getPayload
would always have a block to return.
The interrupt stops adding to that "full" block that is currently being built.
@ajsutton It is implemented like this 👍🏻 I now realize that there's one thing that can be optimized: in
This is exactly the implementation that I had up until last week. We then discussed last week that we rather want The process could also happen with a timeout. Say, wait up to 10ms for the interrupted block building process to complete and the comparison to be done, and then return the existing block already if this times out. However, I remember that we discussed that an instant If we want to revisit that decision, I suggest to do that in a follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, went through this carefully and it looks good to me. Only question is really around the global var and as mentioned, I think we can avoid it but you may have seen something I missed.
@ajsutton thanks for review, I'm happy with your test workaround as well, will merge it into this pr |
We only build the empty block if we don't use the tx pool. So if we use the tx pool, a forkchoiceUpdated call would miss the implicit validation that's happening during empty block building, so we need to add it back.
This commit changes the way the block builder/update routine and the resolution functions Resolve and ResolveFull synchronize. Resolve(Full) now signal the payload builder to pause and set the interrupt signal in case any block building is ongoing. They then wait for the interrupted block building to complete. This allowed to simplify the Payload implementation somewhat because the builder routine is now guaranteed to return before the resulting fields (full, fullFees etc) are read, and closing of the `stop` channel is now synchronized with a sync.Once. So the mutex and conditional variable could be removed and we only use two simple signalling channels `stop` and `done` for synchronization.
Some test in the miner and catalyst package assume that getPayload can be immediately called after forkchoiceUpdated and then to return some built block. Because of the new behavior of payload resolution to interrupt any ongoing payload building process, this creates a race condition on block building. The new testing mode, which can be enabled by setting the package variable IsPayloadBuildingTest to true, guarantees that always at least one full block is built. It's hacky, but seems to be the easiest and less-intrusive way to enable the new behavior of payload resolution while still keeping all tests happy.
- Priotize stop signal over recommit - Don't start payload building update if last update duration doesn't fit until slot timeout.
When resolving, we don't want to wait for the latest update. If a full block is available, we just return that one, as before. Payload building is still interrupted, but exits in the background.
Use a longer wait in tests for the payload to build.
9104cc0
to
d880c81
Compare
Reviewed the payload-interrupt code, and the discussion about the half-full block, but I think we are not interrupting correctly on the first payload currently:
The above two things combined mean we can't fire I think we should add a flag to That way we don't get hung up on large 30M block building attempts if it's the first block building attempt. Edit: I missed the |
Thanks Proto, I moved the interrupt now before potentially waiting on the conditional, so we interrupt even the first block building routine. It was kind of a deliberate decision at the time to build the first block fully, but it's indeed better to even interrupt the first block, because even that one may take a long time. As you wrote in the edit, I already added the I added another test that confirms that interruption actually works. I now had to add more sleeps to the |
Also added interrupt test. Had to add sleep to make non-interrupt test work.
fd3c8d7
to
bf9ea34
Compare
hmm, so there's a failure in op-geth/eth/catalyst/simulated_beacon.go Lines 153 to 172 in a8a9c8e
I've exposed a couple of methods to deterministically wait for the block building to complete rather than needing a fixed timeout in #204 and hooked that up in the simulated beacon as well. We may need to add appropriate calls to wait in more tests yet which will be an annoying source of flakiness for a while but I think this is the right approach to deterministically wait. Definitely open to better ways to expose the ability to wait though. |
Also fix a bug in TestNilWithdrawals where the withdrawals weren't added to the ephemeral BuildPayloadArgs instance for re-calculating the payload id.
Refined @ajsutton's solution and fixed a buggy test ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Sorry I missed a few minor things in my last review, the PR looks ready otherwise
Also always stop interrupt timer after fillTransactions in generateWork.
Description
getPayload
engine api call.NoTxPool
FCU calls, and as before, FCU blocks on building the empty block. This has the advantage that if generating a lot of empty blocks,getPayload
can immediately be called afterforkchoiceUpdated
to return that empty block.getPayload
) now waits for the first full block to be built, as there is no empty block to return. If at least one block has already been built, as before, it is returned immediately, while block building is interrupted in the background.Tests
Extended the
TestBuildPayload
test to cover both cases ofNoTxPool
.Added an ugly global bool to indicate that a test is running, which guarantees one full block is built and block building is not interrupted. This seemed to be the least-invasive way to make all tests still pass. This was necessary because a couple of test call
Resolve
orgetPayload
immediately after a FCU, but this would now interrupts the block building process, which made tests fail.Additional context
Metadata