-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header Accumulator #259
base: main
Are you sure you want to change the base?
Header Accumulator #259
Conversation
8386bf5
to
8667e2f
Compare
ab2c9bf
to
9a5198d
Compare
75612be
to
d635603
Compare
957f4b1
to
e633af0
Compare
For header accumulation blocks (`block.number % HEADER_BATCH_SIZE == 0`), the accumulator tree root | ||
as well as the merkle stack should be encoded as follows: | ||
|
||
```txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious reviewers' thoughts. Should we also append the header batch root to this? Small bit extra data, though makes it so that we wouldn't need to reconstruct the batch tree off-chain for finding leaves of the accumulator tree. It should be a quick process (with a reasonable HEADER_BATCH_SIZE
), and they'll need the full batch tree for inclusion proofs in the batch, though might be nice to have a quick lookup for batch roots as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the size of the redundant data being attached? And would we do any validation that the batch-root included matches what is calculated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did add this, it would be an extra 32 bytes per HEADER_BATCH_SIZE
blocks in the extraData
field. We would need to add a validation step, where we verify that the leaf (the header batch tree root) is verified to be included within the tree that the merkle stack represents.
e633af0
to
d1b2f63
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our Design Doc Template has "Alternatives Considered" and "Risks and Uncertainties" sections which I've found really helpful in the past for my own understanding of the problem space.
Since we're in specs
, maybe it doesn't apply the same way, but in lieu of that, a couple questions:
Seems like the problem we're trying to solve is the impending spike in compute requirements when there are many interoperating superchains that all require header-reading. It'd be more efficient to compute the headers all as a batch and to only emit/expect them once per batch-window.
Does this header-batch-window size inform the resolution at which interop can run? Seems like while you wait for the next header batch, you can't make use of any of the other blocks beyond doing the work (which is what we're trying to avoid).
My other question is around the selection of the Merkle Tree. Seems a good choice for compact storage, but this doc also points out that it only requires two merkle proofs as well as _n_ merkle patricia trie proofs to verify inclusion
. This seems like an important feature, but I don't fully understand how that mechanism is used by nodes to achieve the goal of only a single L2 block at a height that includes _all receipts_ on a single chain that must be validated to resolve its dependents
. Would you mind sketching that out a bit? This could be something very obvious to folks more familiar with Merkle Tree than I am.
## Rationale | ||
|
||
After the activation of the [interop hardfork](../interop/overview.md), the computational complexity of proving the | ||
OP Stack's state transition will increase. The current plan for proving the inclusion of remote logs within chains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OP Stack's state transition will increase. The current plan for proving the inclusion of remote logs within chains | |
OP Stack's state transition will multiply(?). The current plan for proving the inclusion of remote logs within chains |
Might be in this doc farther down, but what is the relationship number-of-chains, total gas, and compute required? I'm thinking it's linear with the total gas of all chains, but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure either, or at least not sure enough to give it a concrete time-complexity.
From what I do know, it will multiply, with a minima of linear complexity increase (scaling with # of chains in the dependency set). Though, due to us having to reproduce the L2 block for every block that a remote log is included in, the computational complexity also grows with the number of relayed messages in the current frame (note: I'm using "frame" here as a non-official term; What I mean is the "frame" in which we're resolving cross-chain dependencies.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximum we have to execute in a single FP VM run is a single L2 block (from a single chain). We can extend the output root bisection game to avoid having to execute a block from every L2 chain in the set within a single VM execution. So the cost to having additional chains is pretty negligible. The issue is actually just how old a log can be referenced by an executing message because we have to walk back one block at a time (by hash) to retrieve that referenced log which gets expensive if it's a long way in the past.
| Term | Description | | ||
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| `ACCUMULATOR_TREE_DEPTH` | `27` | | ||
| `HEADER_BATCH_TREE_DEPTH` | `5` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes out to a 32 leaf nodes, and I think that means a 32x performance optimization, is that correct? As the header concerns now happen only once per HEADER_BATCH_SIZE
?
Do any behaviors change if we find that this needs to be increased further? Like, if we had 64 per batch, would we start to put pressure on other parts of the system in negative ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the optimization. HEADER_BATCH_SIZE
x less data stored in extraData
(since extraData
is only expanded once every HEADER_BATCH_SIZE
blocks)
I would advise not making HEADER_BATCH_SIZE
> 256, since most execution layer clients drop state > 256 blocks old out of memory. With HEADER_BATCH_SIZE <= 256
, the lookups for the header hashes should be very quick, since we can directly consult in-memory state at tip.
For header accumulation blocks (`block.number % HEADER_BATCH_SIZE == 0`), the accumulator tree root | ||
as well as the merkle stack should be encoded as follows: | ||
|
||
```txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the size of the redundant data being attached? And would we do any validation that the batch-root included matches what is calculated?
This choice was made to allow for efficient batching of data, and to reduce historical state expansion. By splitting the incremental tree, and including batch roots as leaves, we have a very nice property where only every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of a hard time picturing how this winds up being used in practice to verify that a log exists on L2. It would be really useful to have a worked example showing how you get from a L2 block hash, back to a receipt from a historic block where all data is accessed by getting the preimage of a hash. Or I guess we could have a new PreimageOracle data type that verifies a provided merkle tree so you can supply a single proof.
each chain, which currently could require deriving and executing many L2 blocks in the context of the verifiable | ||
environment that the proving system is executing within. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to derive multiple L2 blocks in a single execution - we break that down as part of the modified output root bisection so that we are only executing a single L2 block at a time. The final consolidation step requires checking if the cross-chain dependencies are met by checking if a referenced log exists for each executing message. "Fetching" the referenced log message uses the same approach as for fetching L1 logs - we can walk backwards from the chain head via parentHash to the block, then down the receiptHash trie and get the log by hash.
So we don't need to execute any L2 blocks as part of this step, but currently we have to fetch each header back to the block the claimed log is from which, if the required block is quite old can be very expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know! There are currently no specifications for fault proofs post interop, but the output bisection modification makes sense to reduce derivation + execution work. Can we please get this written down somewhere in the specs?
"Fetching" the referenced log message uses the same approach as for fetching L1 logs - we can walk backwards from the chain head via parentHash to the block, then down the receiptHash trie and get the log by hash.
The proposal seeks to optimize this walkback on L2 during the consolidation step by making the lookup constant time, rather than scaling linearly with the depth of the state relative to tip. Especially for messages that are relayed far after they are originally sent. Interop, as far as I understand it, places no limitations on how old a message can be on a remote chain in order for it to be relayed. Relay protection is on the consumer of the relayed logs, meaning we still may have to prove relayed logs that are incredibly deep in historical state. Is this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fault proving interop isn't happening for a while, but I made some notes in https://www.notion.so/oplabs/External-Interop-Fault-Dispute-Game-Notes-1537bf9fad054bcfb2245dea88d48d16?pvs=4 - that links to the current draft PR of the specs but it will need some changes. The key thing though is that applying blocks from multiple chains in a single cannon run is never going to work so we definitely won't be doing that. :)
There is a limit on the log you can reference, currently it's 180 days which is far too long to be fault provable without something like this: https://specs.optimism.io/interop/messaging.html#messaging-invariants
## Rationale | ||
|
||
After the activation of the [interop hardfork](../interop/overview.md), the computational complexity of proving the | ||
OP Stack's state transition will increase. The current plan for proving the inclusion of remote logs within chains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximum we have to execute in a single FP VM run is a single L2 block (from a single chain). We can extend the output root bisection game to avoid having to execute a block from every L2 chain in the set within a single VM execution. So the cost to having additional chains is pretty negligible. The issue is actually just how old a log can be referenced by an executing message because we have to walk back one block at a time (by hash) to retrieve that referenced log which gets expensive if it's a long way in the past.
optimization for interop. For example, in the fault proof program, this feature removes the need for a commit-reveal | ||
walkback when retrieving data within the historical chain. Instead, we would only need to provide small inclusion | ||
proofs for a constant-time lookup of any data in the historical state accumulator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know of any place where we access historical information for the L2 chain - only for L1 which won't have this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't currently, but post-interop, we will have to in order to verify log inclusion for relayed logs where there was a significant delay between the log's emission on chain A and the log being relayed on chain B.
We also do have the walkback on L2 today in the program, though it's not very bad due to output bisection. When fetching PayloadByNumber
, etc., we must walk back from the starting output root (the current safe head) when the program is initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this will definitely help with interop, just the way I read this section it sounded like it was saying it would help with other exisiting behaviour for fault proofs today, but I don't believe there is anything that would benefit. The agreed starting block from the output root bisection is the unsafe, safe and finalized head initially so the L2 walkback just exits the loop immediately because it has already reached the finalized block.
``` | ||
|
||
Both the global accumulator tree as well as the header batch tree use `keccak256` as their hashing function. For all | ||
nodes within the header accumulator tree, all commitments are shortened to `TRUNCATED_COMMITMENT` bytes in length to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note is that if we only have a truncated commitment, we can't use it to retrieve data from the preimage oracle - we need the full hash for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a new PreimageKeyType
for this, yes? Should be the same as the keccak
route, just need to truncate the resulting digest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh I think that should work. This is where I think having those details fleshed out would be really helpful - we don't want to implement this spec and then find there's something that's suboptimal for actually using it in fault proofs, so it's worth making sure we're actually thinking through all the details of how it will be used.
Specifies an addition to block execution in the Granite hardfork, which designates "receipt accumulation" blocks and an addition to the `extraData` field in "receipt accumulation" blocks' headers to enable this functionality.
ba9d971
to
404bf07
Compare
As agreed on the ENG staff meeting, I'm assigning this to @protolambda as lead implementer that will own this side of work for Granite hardfork. |
Remove the execution diff to the interop specs given it is being replaced by #259
Overview
Specifies an addition to block execution in the Isthmus hardfork, which designates "header accumulation" blocks and an addition to the
extraData
field in "header accumulation" blocks' headers to enable this functionality.TODO
Possible Ideas