Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use finalized reference block #350

Merged
merged 6 commits into from
Mar 20, 2024

Conversation

mooselumph
Copy link
Collaborator

Why are these changes needed?

Uses the a finalized block for the operator state instead of the most recent block.

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

Copy link
Contributor

@jianoaix jianoaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a good idea to reference a finalized block

@@ -177,6 +177,13 @@ var (
EnvVar: common.PrefixEnvVar(envVarPrefix, "MAX_BLOBS_TO_FETCH_FROM_STORE"),
Value: 100,
}
FinalizationBlockDelayFlag = cli.UintFlag{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we query Beacon chain for finalized epoch to get the finalized block?

@@ -28,7 +28,7 @@ abstract contract EigenDAServiceManagerStorage is IEigenDAServiceManager {
* time that nodes can be active after they have deregistered. The larger it is, the farther back stakes can be used, but the longer operators
* have to serve after they've deregistered.
*/
uint32 public constant BLOCK_STALE_MEASURE = 150;
uint32 public constant BLOCK_STALE_MEASURE = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this calculated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number has to be greater than FinalizationBlockDelay + BatchInterval/12

If the batch interval is 10 minutes, this evaluates to 150 blocks. I doubled it to ensure we were never running up against this constraint.

I think that 30 minutes has negligible UX implications, given that it is small relative to the 14 days that operators are expected to serve after deregistration.

cc @gpsanant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, add this to the comment? It's useful to know and to maintain this value in the future.

On the other hand it raises maintenance question as already 2 offchain parameters can affect this onchain parameter (or conversely, this impacts the choice of batch interval). 10 mins batch interval is 10*60/12=50 blocks, it means it cannot be significantly larger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default FinalizationBlockDelay is 75

Usage: "The block delay to use for pulling operator state in order to ensure the state is finalized",
Required: false,
EnvVar: common.PrefixEnvVar(envVarPrefix, "FINALIZATION_BLOCK_DELAY"),
Value: 75,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for finalization? What's the downside of using a non-final reference block?
My understanding of the reason why we may want to use an older reference block is to prevent requesting newer blocks than what indexer has indexed. For this purpose, 5~10 blocks seems sufficient.
One downside of having this number too big might a UX concern where operator state is applied that much later. wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage is that the OperatorState which we use is unlikely to change due to a reorg. Using 15 minutes here makes it very unlikely that we'll ever fail a batch due to a reorg as we were seeing on goerli.

I'm not really sure that a period of 15 minutes has any important UX drawbacks. We are mostly moving in the direction of decreasing the frequency of state updates (AVSSync takes us to stake updates once per week and we may do something similar with operator registrations), so I think 15 minutes here doesn't really pose a big issue.

I may be missing something though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the minimal impact on UX and the advantage of this approach.
UX drawback was the only thing i can think of, but my concern is if there is anything else we might be missing here because we've only tested under the scenario where reference block was very recent, and we haven't seen any issues with operator state being reorg'd.
This is probably fine, but we should definitely test this in the wild for some time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we haven't seen any issues with operator state being reorg'd.

Haven't we? I think the assumption has been that the reorg issues we've been seeing on testnet were exactly this.

But yeah, let's see if it causes any unexpected issues. I can't think if any right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another part is the operator experience where they will be held longer for dispersal after opt-out.
Overall I think it's an improvement for robustness.

Copy link
Contributor

@bxue-l2 bxue-l2 Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to this twitter thread, it looks like on average 20 reorg per day. So in the worst case, need to wait 20 blocks. https://twitter.com/terencechain/status/1768664666996355178

Copy link
Collaborator Author

@mooselumph mooselumph Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does 20 reorgs / day mean we need to wait 20 blocks? I don't understand why these numbers would be related.

@@ -45,7 +45,6 @@ services:
SRS_LOAD: 2900
CHALLENGE_ORDER: 3000
LOG_LEVEL: "debug"
LOG_FORMAT: "text"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not want this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is set to "json", so I was going with that. Is there a reason we want text logging?

Copy link
Contributor

@jianoaix jianoaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -28,7 +28,7 @@ abstract contract EigenDAServiceManagerStorage is IEigenDAServiceManager {
* time that nodes can be active after they have deregistered. The larger it is, the farther back stakes can be used, but the longer operators
* have to serve after they've deregistered.
*/
uint32 public constant BLOCK_STALE_MEASURE = 150;
uint32 public constant BLOCK_STALE_MEASURE = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, add this to the comment? It's useful to know and to maintain this value in the future.

On the other hand it raises maintenance question as already 2 offchain parameters can affect this onchain parameter (or conversely, this impacts the choice of batch interval). 10 mins batch interval is 10*60/12=50 blocks, it means it cannot be significantly larger

@mooselumph mooselumph merged commit 30b6bb4 into Layr-Labs:master Mar 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants