Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-node: continue on retrieval error #142

Merged
merged 1 commit into from
Jun 6, 2023

Conversation

tuxcanfly
Copy link
Collaborator

@tuxcanfly tuxcanfly commented Jun 6, 2023

Overview

There is a persistent issue with the rollup node not progressing on safe blocks. Upon inspection, the rollup node tries to fetch frame from data but fails due to:

WARN [06-06|11:04:20.344] unable to retrieve data from da          origin=b83465..d34c02:9130159 err="Get \"http://localhost:26659/namespaced_data/00000000000000000000000000000000000000f92140012837778721be/height/36708\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
WARN [06-06|11:04:20.346] Failed to parse frames                   origin=b83465..d34c02:9130159 err="parsing frame 0: reading channel_id: unexpected EOF"

In case of an error during da retrieval, we can continue so that the derivation can proceed without trying to parse (and failing) invalid frames.

Checklist

  • New and updated code has appropriate documentation
  • New and updated code has new and/or updated testing
  • Required CI checks are passing
  • Visual proof for any user facing features like CLI or documentation updates
  • Linked issues closed with keywords

@tuxcanfly tuxcanfly requested a review from jcstein June 6, 2023 23:28
@jcstein
Copy link
Member

jcstein commented Jun 6, 2023

LGTM :shipit:

@jcstein jcstein merged commit 5c9b975 into celestia-develop Jun 6, 2023
@tuxcanfly
Copy link
Collaborator Author

tuxcanfly commented Jun 7, 2023

This wasn't quite the right fix, because the rollup node doesn't know to retry the temporary error, so it skips the frame.

WARN [06-06|17:58:04.375] requesting celestia namespaced data      origin=0e2085..27f227:9131334 namespace=00000000000000000000000000000000000000f92140012837778721be height=37852
WARN [06-06|17:58:04.377] unable to retrieve data from da          origin=0e2085..27f227:9131334 err="Get \"http://localhost:26659/namespaced_data/00000000000000000000000000000000000000f92140012837778721be/height/37852\": dial tcp [::1]:26659: connect: connection refused"
INFO [06-06|17:58:05.569] Advancing bq origin                      origin=e09194..029bca:9131335 originBehind=false
...
WARN [06-06|17:58:46.077] requesting celestia namespaced data      origin=a578de..f6f3e2:9131376 namespace=00000000000000000000000000000000000000f92140012837778721be height=37891
WARN [06-06|17:58:46.256] retrieved data                           origin=a578de..f6f3e2:9131376 data=00...
INFO [06-06|17:58:46.266] created new channel                      origin=a578de..f6f3e2:9131376 channel=af7f0b..1284a6 length=3405  frame_number=0 is_last=true
INFO [06-06|17:58:46.266] Reading channel                          channel=af7f0b..1284a6 frames=1

Note that after failing to read celestia block 37852, it skips and reads 37891 next.

We need to return and handle the sentinel NewTemporaryError introduced in #43 in NextData so that NextFrame doesn't try to parse it in ParseFrames. Looks like the sentinel is not used in this context as there was no error possible here previously (in Ethereum DA layer).

@jcstein
Copy link
Member

jcstein commented Jun 7, 2023

merged this prematurely, so think we can revert the change in another PR 👍

@tuxcanfly
Copy link
Collaborator Author

tuxcanfly commented Jun 7, 2023

Cool. FWIW here's a way to reliably repro this.

Temporarily bring the celestia light node down until there's at least one

WARN [06-06|17:58:04.377] unable to retrieve data from da          origin=0e2085..27f227:9131334 err="Get \"http://localhost:26659/namespaced_data/00000000000000000000000000000000000000f92140012837778721be/height/37852\": dial tcp [::1]:26659: connect: connection refused"

error in the op-node logs. Now when celestia light node is bought up again, the derivation should resume from the last failed da block.

This simulates a temporary node error, for e.g.

 context deadline exceeded (Client.Timeout exceeded while awaiting headers)

which was causing the original issue with lagging safe blocks.

@tuxcanfly
Copy link
Collaborator Author

Followed up in #144

@jcstein jcstein deleted the tux/fix-op-node-safe-lag branch June 7, 2023 18:07
tuxcanfly added a commit that referenced this pull request Oct 10, 2023
tuxcanfly added a commit that referenced this pull request Nov 2, 2023
tuxcanfly added a commit that referenced this pull request Nov 2, 2023
tuxcanfly added a commit that referenced this pull request Nov 2, 2023
tuxcanfly added a commit that referenced this pull request Nov 3, 2023
tuxcanfly added a commit that referenced this pull request Nov 3, 2023
tuxcanfly added a commit that referenced this pull request Nov 3, 2023
tuxcanfly added a commit that referenced this pull request Nov 3, 2023
tuxcanfly added a commit that referenced this pull request Nov 3, 2023
tuxcanfly added a commit that referenced this pull request Nov 3, 2023
tuxcanfly added a commit that referenced this pull request Nov 4, 2023
tuxcanfly added a commit that referenced this pull request Nov 4, 2023
tuxcanfly added a commit that referenced this pull request Nov 4, 2023
tuxcanfly added a commit that referenced this pull request Nov 7, 2023
tuxcanfly added a commit that referenced this pull request Nov 8, 2023
tuxcanfly added a commit that referenced this pull request Nov 9, 2023
tuxcanfly added a commit that referenced this pull request Nov 10, 2023
tuxcanfly added a commit that referenced this pull request Nov 10, 2023
tuxcanfly added a commit that referenced this pull request Nov 10, 2023
tuxcanfly added a commit that referenced this pull request Nov 13, 2023
tuxcanfly added a commit that referenced this pull request Nov 14, 2023
tuxcanfly added a commit that referenced this pull request Nov 14, 2023
tuxcanfly added a commit that referenced this pull request Nov 14, 2023
tuxcanfly added a commit that referenced this pull request Nov 14, 2023
tuxcanfly added a commit that referenced this pull request Nov 14, 2023
tuxcanfly added a commit that referenced this pull request Nov 15, 2023
tuxcanfly added a commit that referenced this pull request Nov 15, 2023
tuxcanfly added a commit that referenced this pull request Nov 17, 2023
tuxcanfly added a commit that referenced this pull request Nov 20, 2023
tuxcanfly added a commit that referenced this pull request Nov 20, 2023
tuxcanfly added a commit that referenced this pull request Nov 28, 2023
batcher: fix duplicate cli flag name

docker: use latest devnet image

bump local celestia devnet to 0.9.1

docker: wait for da to boot

op-ndoe: decodeeETHData - use uint32 for height

proposer: fix da rpc, estimate gas

op-node: fix failed to parse frame error

proposer: fix revert

proposer: restore gas limit; add explanation

proposer: nit

add celestia logo, update readme

Update README.md

style size

plus not x

docs: edit intro

Update README.md

Update README.md

nit

docs: add section on deposit bridge

add workaround from @Ferret-san

Co-Authored-By: Diego <[email protected]>

close code block

Co-Authored-By: Diego <[email protected]>

fix: typo

Co-Authored-By: Diego <[email protected]>

Update README.md

docs: add bridging section from guidance from @Ferret-san

Co-Authored-By: Diego <[email protected]>

deps: bump local-devnet to 0.9.2

docker: fix da container name

op-node: gracefully retry da connection err

test: update e2e tests for celestia da

docs: update readme

deps: update local-celestia-devnet to v0.9.5

Add DALC support (#47)

* deps: add vars to devnet-up.sh script

* add new docker-compose and script

* fix: add testnet-up to makefile

* add testnet-clean and testnet-down make cmds

* match other instances of localhost

* fix: paths

fix: paths in Makefile

* remove unnecessary vars

* docs: update link in readme

* fix: use light node image

* add celestia light node image

* fix: start DA as dependency to batcher

* docs: copy edits

* docker: testnet - mount celestia light volume

* celestia: remove hardcoded key; add README

* Update README.md

* Update op-celestia/blockspacerace-data/README.md

* edit readme

* docker: celestia startup - fix permission denied

* revert changes from dde9c93

---------

Co-authored-by: Javed Khan <[email protected]>

fix: update path to node store (#52)

devnet: increase block times (#50)

* devnet: increase block times

* proposer: bump network timeout

* proposer timeout: 5 -> 5s

* proposer: tweak timeout to 180s

* fix: ymlup

---------

Co-authored-by: joshcs.eth <[email protected]>

deps: bump celestia-node & local-celestia-devnet image version to v0.10.0 (#56)

* deps: bump celestia-node v0.9.5 to v0.10.0

* change RPC

* swap RPC

* deps: bump v0.9.5 to v0.10.0 local-celestia-devnet

op-node: log namespaced data details (#94)

* op-node: log namespaced data details

* op-node: log hex

Dependabot and Fork Sync (#93)

* .github: add sync fork workflow and update dependabot to only check .github and go.mod

* .github: update dependabot.yml

fix: resolve `max_subscriptions` and` share sequence` errors (#120)

* change RPC to P-OPS

* deps: change to test version of node

celestiaorg/celestia-node#2257

* fix: version for image

* use sha

* change path...

docs: add wiki to README

proposer: allow non finalized true -> false (#133)

deps: update to node RC for Arabica (#132)

* deps: update to node RC for Arabica

* deps: update for arabica

* add changes from #133 to test

* storage in arabica-6 not arabica-8

* deps: update RPC

* deps: add host.docker.internal for testing local light node (not docker)

* Update docker-compose-devnet.yml

* Revert "deps: add host.docker.internal for testing local light node (not docker)"

This reverts commit 09490fd.

* Revert "Update docker-compose-devnet.yml"

This reverts commit cccaf66.

* increase namespace ID to 28 bytes

* update RPC and to arabica-8 for node store

* deps: bump node to rc2

* Update docker-compose-testnet.yml

* deps: bump go-cnc to 0.4.1, fix breaking changes

* Update docker-compose-devnet.yml

* bump gas fee

* revert 71a3d18

this is so that #133 can be merged before this PR

---------

Co-authored-by: Javed Khan <[email protected]>

deps: bump celestia-node to v0.11.0-rc2 (#137)

* deps: bump celestia-node to v0.11.0-rc2

* docs: match celestia-node#2277

deps: remove trailing slash (#139)

op-node: continue on retrieval error (#142)

op-node: calldata source - reset pipeline on err (#144)

op-node: da-rpc, namespace-id - use environment flags (#158)

* op-node: use global string for da flags

* op-node: use flag name

* testnet: better healthcheck

* testnet: restore rpc flags

testnet: update node rc, flags for arabica-9 (#157)

* testnet: update node rc, flags for arabica-9

* go mod: bump go-cnc to 0.4.2-pre

* testnet: bump celestia-node to v0.11.0-rc6

* txmgr: submit pfb - handle unexpected response code; bump fee

* nit: typo

* go mod: bump go-cnc to v0.4.2

* txmgr: submit pfb - handle unexpected response - check tx hash

* txmgr: bump gas limit

* txmgr: add additional height sanity check

devnet: bump local-celestia-devnet to v0.11.0-rc6 (#168)

* devnet: fix op-batcher depends_on

* devnet: bump local-celestia-devnet to v0.11.0-rc6

* devnet: da - bump start_period and interval

* testnet: use new namespace id

* fix: celestia-node version and remove deprecated flags

---------

Co-authored-by: joshcs.eth ᵍᵐ <[email protected]>

deps: local-celestia-devnet -> v0.11.0-rc8 (#175)

da: use celestia-openrpc (#187)

* op-node: use openrpc to get blobs

* op-batcher: use openrpc to submit blobs

* op-celestia-: use openrpc to fetch blobs

* da: update rpc port

* docker: update rpc port

* readme: devnet - rm unused port

* docker: devnet - restore platform

docker: testnet add debug log flag (#189)

da: celestia-openrpc - use blob commitment (#196)

* op-batcher: use openrpc to submit blobs

* op-celestia-: use openrpc to fetch blobs

* da: update rpc port

* docker: update rpc port

* readme: devnet - rm unused port

* docker: devnet - restore platform

* da: use blob commitment instead of tx index

* celestia: use binary interfaces

* go mod: tidy

* deps: bump [email protected]

* pkg celestia: test - add invalid case

* deps: restore [email protected]

da: use celestia da; update config

da: go mod tidy

da: update devnet docker-compose.yml

da: fix devnet script; op-stack-go docker; op-celestia

.github: update base repo owner (#223)

* .github: update base repo owner

* add use of PAT for sync action

---------

Co-authored-by: Matthew Sevey <[email protected]>

.github: fix base branch (#226)

op-server: bump celestia-openrpc to v0.3.0

go mod tidy

chore: update devnet version to v0.12.1 (#245)

* chore: update devnet version to v0.12.1

* docs: update readme to v0.12.1 devnet

* chore: update celestia description

* chore: update celestia-node to v0.12.0

da: move batcher changes from op-service to batcher/driver

major rewrite and tech debt cleanup

* batcher changes moved from op-service to batcher driver
* da flags consolidated into da.rpc, da.namespace-id, da.auth-token
* batcher / proposer da flags removed
* batcher fetches da config from op-node / same as rollup config

da: celestia-openrpc -> celestia-da

da: log blob submit and request

da: log prefix

da: reuse rollup config for da rpc url

da: use env var for da rpc url

da: update local celestia devnet image

da: reuse da client; bump local celestia devnet version

da: move da client to new file; fix devnet version

da: rpc - default to localhost

da: bump local-celestia-devnet:v0.12.2

da: update README

da: rm op-celestia

da: calldata version prefix, eth fallback
tuxcanfly added a commit that referenced this pull request Nov 28, 2023
batcher: fix duplicate cli flag name

docker: use latest devnet image

bump local celestia devnet to 0.9.1

docker: wait for da to boot

op-ndoe: decodeeETHData - use uint32 for height

proposer: fix da rpc, estimate gas

op-node: fix failed to parse frame error

proposer: fix revert

proposer: restore gas limit; add explanation

proposer: nit

add celestia logo, update readme

Update README.md

style size

plus not x

docs: edit intro

Update README.md

Update README.md

nit

docs: add section on deposit bridge

add workaround from @Ferret-san

Co-Authored-By: Diego <[email protected]>

close code block

Co-Authored-By: Diego <[email protected]>

fix: typo

Co-Authored-By: Diego <[email protected]>

Update README.md

docs: add bridging section from guidance from @Ferret-san

Co-Authored-By: Diego <[email protected]>

deps: bump local-devnet to 0.9.2

docker: fix da container name

op-node: gracefully retry da connection err

test: update e2e tests for celestia da

docs: update readme

deps: update local-celestia-devnet to v0.9.5

Add DALC support (#47)

* deps: add vars to devnet-up.sh script

* add new docker-compose and script

* fix: add testnet-up to makefile

* add testnet-clean and testnet-down make cmds

* match other instances of localhost

* fix: paths

fix: paths in Makefile

* remove unnecessary vars

* docs: update link in readme

* fix: use light node image

* add celestia light node image

* fix: start DA as dependency to batcher

* docs: copy edits

* docker: testnet - mount celestia light volume

* celestia: remove hardcoded key; add README

* Update README.md

* Update op-celestia/blockspacerace-data/README.md

* edit readme

* docker: celestia startup - fix permission denied

* revert changes from dde9c93

---------

Co-authored-by: Javed Khan <[email protected]>

fix: update path to node store (#52)

devnet: increase block times (#50)

* devnet: increase block times

* proposer: bump network timeout

* proposer timeout: 5 -> 5s

* proposer: tweak timeout to 180s

* fix: ymlup

---------

Co-authored-by: joshcs.eth <[email protected]>

deps: bump celestia-node & local-celestia-devnet image version to v0.10.0 (#56)

* deps: bump celestia-node v0.9.5 to v0.10.0

* change RPC

* swap RPC

* deps: bump v0.9.5 to v0.10.0 local-celestia-devnet

op-node: log namespaced data details (#94)

* op-node: log namespaced data details

* op-node: log hex

Dependabot and Fork Sync (#93)

* .github: add sync fork workflow and update dependabot to only check .github and go.mod

* .github: update dependabot.yml

fix: resolve `max_subscriptions` and` share sequence` errors (#120)

* change RPC to P-OPS

* deps: change to test version of node

celestiaorg/celestia-node#2257

* fix: version for image

* use sha

* change path...

docs: add wiki to README

proposer: allow non finalized true -> false (#133)

deps: update to node RC for Arabica (#132)

* deps: update to node RC for Arabica

* deps: update for arabica

* add changes from #133 to test

* storage in arabica-6 not arabica-8

* deps: update RPC

* deps: add host.docker.internal for testing local light node (not docker)

* Update docker-compose-devnet.yml

* Revert "deps: add host.docker.internal for testing local light node (not docker)"

This reverts commit 09490fd.

* Revert "Update docker-compose-devnet.yml"

This reverts commit cccaf66.

* increase namespace ID to 28 bytes

* update RPC and to arabica-8 for node store

* deps: bump node to rc2

* Update docker-compose-testnet.yml

* deps: bump go-cnc to 0.4.1, fix breaking changes

* Update docker-compose-devnet.yml

* bump gas fee

* revert 71a3d18

this is so that #133 can be merged before this PR

---------

Co-authored-by: Javed Khan <[email protected]>

deps: bump celestia-node to v0.11.0-rc2 (#137)

* deps: bump celestia-node to v0.11.0-rc2

* docs: match celestia-node#2277

deps: remove trailing slash (#139)

op-node: continue on retrieval error (#142)

op-node: calldata source - reset pipeline on err (#144)

op-node: da-rpc, namespace-id - use environment flags (#158)

* op-node: use global string for da flags

* op-node: use flag name

* testnet: better healthcheck

* testnet: restore rpc flags

testnet: update node rc, flags for arabica-9 (#157)

* testnet: update node rc, flags for arabica-9

* go mod: bump go-cnc to 0.4.2-pre

* testnet: bump celestia-node to v0.11.0-rc6

* txmgr: submit pfb - handle unexpected response code; bump fee

* nit: typo

* go mod: bump go-cnc to v0.4.2

* txmgr: submit pfb - handle unexpected response - check tx hash

* txmgr: bump gas limit

* txmgr: add additional height sanity check

devnet: bump local-celestia-devnet to v0.11.0-rc6 (#168)

* devnet: fix op-batcher depends_on

* devnet: bump local-celestia-devnet to v0.11.0-rc6

* devnet: da - bump start_period and interval

* testnet: use new namespace id

* fix: celestia-node version and remove deprecated flags

---------

Co-authored-by: joshcs.eth ᵍᵐ <[email protected]>

deps: local-celestia-devnet -> v0.11.0-rc8 (#175)

da: use celestia-openrpc (#187)

* op-node: use openrpc to get blobs

* op-batcher: use openrpc to submit blobs

* op-celestia-: use openrpc to fetch blobs

* da: update rpc port

* docker: update rpc port

* readme: devnet - rm unused port

* docker: devnet - restore platform

docker: testnet add debug log flag (#189)

da: celestia-openrpc - use blob commitment (#196)

* op-batcher: use openrpc to submit blobs

* op-celestia-: use openrpc to fetch blobs

* da: update rpc port

* docker: update rpc port

* readme: devnet - rm unused port

* docker: devnet - restore platform

* da: use blob commitment instead of tx index

* celestia: use binary interfaces

* go mod: tidy

* deps: bump [email protected]

* pkg celestia: test - add invalid case

* deps: restore [email protected]

da: use celestia da; update config

da: go mod tidy

da: update devnet docker-compose.yml

da: fix devnet script; op-stack-go docker; op-celestia

.github: update base repo owner (#223)

* .github: update base repo owner

* add use of PAT for sync action

---------

Co-authored-by: Matthew Sevey <[email protected]>

.github: fix base branch (#226)

op-server: bump celestia-openrpc to v0.3.0

go mod tidy

chore: update devnet version to v0.12.1 (#245)

* chore: update devnet version to v0.12.1

* docs: update readme to v0.12.1 devnet

* chore: update celestia description

* chore: update celestia-node to v0.12.0

da: move batcher changes from op-service to batcher/driver

major rewrite and tech debt cleanup

* batcher changes moved from op-service to batcher driver
* da flags consolidated into da.rpc, da.namespace-id, da.auth-token
* batcher / proposer da flags removed
* batcher fetches da config from op-node / same as rollup config

da: celestia-openrpc -> celestia-da

da: log blob submit and request

da: log prefix

da: reuse rollup config for da rpc url

da: use env var for da rpc url

da: update local celestia devnet image

da: reuse da client; bump local celestia devnet version

da: move da client to new file; fix devnet version

da: rpc - default to localhost

da: bump local-celestia-devnet:v0.12.2

da: update README

da: rm op-celestia

da: calldata version prefix, eth fallback
tuxcanfly added a commit that referenced this pull request Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants