Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Span batch specs to allow overlapping batches #7259

Conversation

ImTei
Copy link
Collaborator

@ImTei ImTei commented Sep 15, 2023

Context

  • Current span batch spec does not allow span batches that overlaps current safe chain.
  • So the op-node cannot proceed derivation if the pipeline is reset to the middle of span batch.
  • It's very hard to guarantee that the op-node will always reset the safe head to the boundary of the span batch.
  • Therefore we need to allow overlapping span batches.

Changes

  • Specs
    • Updated span batch validation rules to allow overlapping batches

@ImTei ImTei requested review from a team as code owners September 15, 2023 11:45
@ImTei ImTei requested review from maurelian and refcell September 15, 2023 11:45
@mergify mergify bot added the M-docs Meta: documentation related label Sep 15, 2023
@trianglesphere trianglesphere self-requested a review September 15, 2023 16:12
@ImTei ImTei force-pushed the tip/allow-overlapping-span-batch branch from 48ec1c1 to 0233631 Compare September 18, 2023 07:24
@refcell refcell removed their request for review September 19, 2023 16:41
@github-actions
Copy link
Contributor

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 14, 2023
@ImTei
Copy link
Collaborator Author

ImTei commented Oct 16, 2023

@protolambda Please review this PR when you get a chance

@protolambda protolambda mentioned this pull request Oct 2, 2023
2 tasks
@github-actions github-actions bot removed the Stale label Oct 17, 2023
Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlapped case is pretty weird, but we need to pay very careful attention to it.

Starting from normal operation, my assumption is that when we have a span batch which contains first a string of valid batches & then an invalid batch & then valid batches again, we will process all the valid batches, reject the invalid batch, and then wait for a sequence window timeout before discarding the span batch & processing more.

Looking at the implementation, we actually just pop all of the batches out of the span batch regardless of if a batch was successfully applied or not. It will get to the attributes queue & will reset the entire pipeline.

specs/span-batches.md Outdated Show resolved Hide resolved
specs/span-batches.md Show resolved Hide resolved
@ImTei
Copy link
Collaborator Author

ImTei commented Oct 19, 2023

@trianglesphere Thanks for the comment. It seems to be a very critical point.
so I understand that we need to consider the atomicity of the span-batch. This is a different topic than overlapping batch. Is it correct?

  • As you said, we don't check if the previous singular batch is applied successfully and just pop the next batch in the current implementation of span batch derivation.
  • We must check that. This can be done simply by comparing the given safe head's block hash and the cached batch's parent hash.
  • And if some singular batch derived from the span batch fails to be applied, we have to discard the entire span batch. It means that we need to reorg the chain to the state before the span batch.
  • This reorg does not mean resetting the entire pipeline. we should just reset the safe head and keep the rest states.

@protolambda what do you think?

@trianglesphere
Copy link
Contributor

@ImTei, yep, we do need to consider the atomicity. I would also take a step back & think about what behavior we want if there is an invalid batch inside a span batch. The other thing is that we need to think about this in the context of re-orgs & also other singular batches which have been included.

@ImTei
Copy link
Collaborator Author

ImTei commented Oct 20, 2023

@trianglesphere @protolambda

I have specified the problematic scenario and what to implement. I'll implement those things in the next follow-up PR. let me know if there are any concerns or suggestions! :)

  • Even if the span batch passed the batch queue’s validity checks, a singular batch from the span batch might fail to be applied to the safe chain due to the BlockInsertPayloadErr.
  • In this case, we suppress the error and retry the pipeline with the next batch from the batch queue.
  • If this case occurs in the middle of a span batch, i.e. we already applied some singular batches from the span batch to the engine, we must revert them for the atomicity of span batches.
  • Therefore, we need the following behaviors when BlockInsertPayloadErr occurs during processing span batches:
    • Engine queue must trigger safe chain reorg by resetting its safe head
    • Batch queue must reset cached singular batches derived from the reverted span batch.
  • And this reorg must be possible, even if we’re deriving the already finalized blocks. So we have to finalize the L2 blocks when an entire span batch is successfully applied to the chain.
    • To achieve this, the pipeline should pass the signal to indicate whether the current processing singular batch is the last one from the span batch.
    • And the engine queue must consider this signal to schedule the L2 finalization, and to use it for the reorg target block.

specs/span-batches.md Outdated Show resolved Hide resolved
@ajsutton
Copy link
Contributor

I commented in #7621 about the atomicity issues being detected here. If we're delaying the update of the finalized block until the end of the span batch I think we should also consider delaying the update of safe head - that's another step towards actually applying span batches atomically. We'd still have changes to the l1 origin and things like batch queue l1Blocks which are hard to roll back though.

Whichever we go technically, I'd encourage having a really strong focus on simplifying things so rather than trying to handle interesting corner cases like span batches being partially applied, we try to find approaches that reduce the number of corner cases and simplify the way we need to reason about the protocol.

@ImTei ImTei mentioned this pull request Oct 26, 2023
@ImTei ImTei force-pushed the tip/allow-overlapping-span-batch branch from f48526b to 81cbf88 Compare October 26, 2023 06:26
@protolambda protolambda enabled auto-merge November 7, 2023 23:31
@protolambda protolambda added this pull request to the merge queue Nov 7, 2023
Merged via the queue into ethereum-optimism:develop with commit 5d79b40 Nov 8, 2023
57 checks passed
@protolambda protolambda deleted the tip/allow-overlapping-span-batch branch November 8, 2023 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-docs Meta: documentation related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants