Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OP proposer is stalled for 25k blocks in devnet version v0.1.0-OP_v1.0.6-CN_v0.9.5 on devnet #54

Closed
jcstein opened this issue May 22, 2023 · 9 comments · Fixed by #133
Closed
Labels
bug Something isn't working

Comments

@jcstein
Copy link
Member

jcstein commented May 22, 2023

The "safe" block is 25000 blocks behind the "latest" block.

Here are the full logs

Screenshot 2023-05-22 at 12 04 11 PM Screenshot 2023-05-22 at 12 04 05 PM
@jcstein jcstein added the bug Something isn't working label May 22, 2023
@TheAppleTucker
Copy link

Think this has to do with the speed of the derivation pipeline in op-node. Think it's quite slow to sync call_data transactions so the safe blocks tend to lag behind

@jcstein
Copy link
Member Author

jcstein commented May 22, 2023

Thank you @TheAppleTucker - do you think even 25k blocks behind is expected based on the config?

@jcstein jcstein changed the title OP proposer is stalling in devnet version v0.1.0-OP_v1.0.6-CN_v0.9.5 OP proposer is stalled for 25k blocks in devnet version v0.1.0-OP_v1.0.6-CN_v0.9.5 May 22, 2023
@jcstein jcstein changed the title OP proposer is stalled for 25k blocks in devnet version v0.1.0-OP_v1.0.6-CN_v0.9.5 OP proposer is stalled for 25k blocks in devnet version v0.1.0-OP_v1.0.6-CN_v0.9.5 on devnet May 23, 2023
@tuxcanfly
Copy link
Collaborator

tuxcanfly commented May 30, 2023

there seem to have been a lot of panics in the logs above:

grep panic 115k-devnet-logs.log

This should be fixed by celestiaorg/celestia-node#2244

So we should test this again against v0.10.2

@jcstein
Copy link
Member Author

jcstein commented May 30, 2023

Fortunately, there were no instances of panic in this run on v0.10.1 last week #120 (comment)

@jcstein
Copy link
Member Author

jcstein commented May 30, 2023

unfortunately, the proposer is still stalling, but we don't really know which error thrown is the issue

@tuxcanfly
Copy link
Collaborator

I debugged the logs mentioned in #120

#120 (comment)

Long shot but are we sure it was running with proposer timeout fix?

@jcstein
Copy link
Member Author

jcstein commented May 30, 2023

yes it was running with proposer timeout fix #120 (comment)

@tuxcanfly
Copy link
Collaborator

It seems that node requests stall whenever head request to trusted peer failed error occurs:

celestiaorg/celestia-node#2241 (comment)

This seems to be happening in node and then spilling over to proposer, will need to investigate if bumping node timeout is a workaround for this.

@tuxcanfly
Copy link
Collaborator

OK, I think I found the issue, it seems that proposer is stalling waiting for safe l2 head to be ahead of the checkpoint head, but since proposer was not able to update safe head due to op-node error, it can't progress and goes into a livelock.

If this is indeed the bug, then either increasing the op-node timeout or setting OP_PROPOSER_ALLOW_NON_FINALIZED: "false" should fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants