Skip to content
Javed Khan edited this page May 29, 2023 · 1 revision

Celestia x OP Stack Maintenance Guide ⛑️

💡 Note that Celestia x OP Stack integration is still under heavy development, with multiple dependencies, each with their own release schedule. Bugs in the dependencies and the integration itself are to be expected. Testing each iteration takes time and has multiple moving parts and unknowns like network stability, bugs and some known issues. Reader’s patience is appreciated 🙏

Intro

Repo: https://github.com/celestiaorg/optimism

This repository contains the integration code for OP Stack x Celestia, which is a Celestium i.e it uses Celestia as a DA layer.

The implementation details are in https://github.com/celestiaorg/optimism/pull/3

celestia-develop is the main branch and tracks updates from optimism develop

Glossary

  • Frame - byte slice which is written as calldata to ethereum
  • Frame Pointer - unique identifier to the frame, serialized as 8-byte block height + 4-byte transaction index on celestia
  • Resolving Frame - additional look up to query the data by it’s unique identifier on celestia and return that in place of the original calldata

Good to know

The gist of the integration is basically:

Wherever SimpleTxManager.send was calling craftTx directly with the calldata, it should now call SubmitPFB on the celestia client with the data, and post the resulting Frame Pointer as calldata to the batch inbox address on ethereum instead.

Where DataFromEVMTransactions before was reading transaction calldata and returning frames directly, it should now resolve the frame from the frame pointer on ethereum and return that instead:

Debug statements in the batcher are added to print the status of the SubmitPFB call and can be viewed with:

$ docker logs -f ops-bedrock_op-batcher_1

💡 Note that the name for the docker container can be different depending on the OS

res: &{180 86461278EDB28D7C775A70DF81AA8376ED37B13D398FA6B7DDDDA858531BDD17  0 122A0A282F63656C65737469612E626C6F622E76312E4D7367506179466F72426C6F6273526573706F6E7365 [{"msg_index":0,"events":[{"type":"celestia.blob.v1.EventPayForBlobs","attributes":[{"key":"blob_sizes","value":"[779]"},{"key":"namespace_ids","value":"[\"6OX2eb9xFss=\"]"},{"key":"signer","value":"\"celestia1vdjkcetnw35kzvt4wdkrs7ne0yekuafk0yukgu3ewf58gctn09krxctxd4jk67fsd3hrw7thw93sxhykdq\""}]},{"type":"message","attributes":[{"key":"action","value":"/celestia.blob.v1.MsgPayForBlobs"}]}]}] [{0  [{celestia.blob.v1.EventPayForBlobs [{blob_sizes [779]} {namespace_ids ["6OX2eb9xFss="]} {signer "celestia1vdjkcetnw35kzvt4wdkrs7ne0yekuafk0yukgu3ewf58gctn09krxctxd4jk67fsd3hrw7thw93sxhykdq"}]} {message [{action /celestia.blob.v1.MsgPayForBlobs}]}]}]  700000 70562 nil }
TxData: [0 0 0 0 0 0 0 180 0 0 0 0]

Resolving the frame can be done manually by calling a utility op-celestia/main.go that deserializes the frame pointer from the batcher call data, queries celestia and returns the blob for e.g.:

$ cd op-celestia

$ go run main.go 000000000000008100000000

Ideally the rollup node should be resolving the frame pointers from celestia and seamlessly processing L2 blocks, although the block data is now being store on celestia:

$ docker logs -f ops-bedrock_op-node_1

t=2023-05-18T17:49:02+0000 lvl=info msg="Sync progress"                          reason="reconciled with L1"    l2_finalized=0x5e44050144ca1ea32218b1cb250c73b282f338c821fb127dd6771d8f85e85297:0 l2_safe=0xd7b51acb75948e67800786644ad729f36fec951cdfc4eb1f07d46992e2d44174:1234 l2_unsafe=0x90dad3710af8604e9b673d9096c1e71b634e93ed83c3e6f1501fb0276b6b6ae4:1240 l2_time=1,684,432,142 l1_derived=0xdc4fa4d25730b7dd818a3400a93e4af7893e7f2085f90326c55575a10bd71e25:701
t=2023-05-18T17:49:02+0000 lvl=info msg="Found next batch"                       epoch=0xb236ece1d1ee998e5d1821d696f43237d1e3045148b5322946727f56d87df902:696 batch_epoch=696 batch_timestamp=1,684,432,132
t=2023-05-18T17:49:02+0000 lvl=info msg="generated attributes in payload queue"  txs=1 timestamp=1,684,432,132

As long as the batcher is able to submit frame pointers and the node is able to resolve those using celestia, the rollup should proceed normally 🚀

💡 Note that as of the latest release v0.1.0-OP_v1.0.6-CN_v0.10.0 we are only submitting batch calldata to celestia, not proposer data, even though a common service op-service handles all data submission. See https://github.com/celestiaorg/optimism/issues/27

Debugging

In order of ease of debugging, one can try:

  • run e2e tests see below
  • run devnet and test manually
  • run devnet against robusta

Most issues can be reproduced locally and should be debugged on local devnets.

If for any reason, an issue cannot be reproduced locally and depends on testnet, here are some ideas to debug:

  • Check op-node or op-batcher logs to identify the root cause of the re-org
  • Check celestia node logs for a corresponding failure in read / write to the DA layer
  • Ask node team for hints on the error collected from node logs
  • Modify local-celestia-devnet to tweak network params to match the testnet
  • Try to reproduce the error on a automated network like robusta (same network params)

Testing

Manual testing involves running the devnet and monitoring for errors:

$ make devnet-up

$ cd ops-bedrock

$ docker-compose logs -f

Ideally there should be any errors or warnings, except minor connection issues or timeouts, these are automatically retried so can be safely ignored.

There’s also a testnet integration to run the instance against a celestia testnet like BlockSpaceRace The script to start and monitor this is same as devnet:

$ make testnet-up

$ cd ops-bedrock

$ docker-compose logs -f

💡 Note that running the testnet integration requires a light node to be fully synced with the network. This is because `op-node` needs to have a celestia node ready to be queried otherwise it will fail to sync the chain from L1 and re-org

Optimism test suite has e2e tests which are useful for automated testing. These e2e tests are also updated to use celestia DA integration, so they can be triggered with:

$ cd op-e2e

$ make test

This requires a node listening on [http://localhost:26659](http://localhost:26659) before the tests can run

All tests passing indicates that there is no obvious integration issue. However there can be issues with the setup that are only noticed after some uptime due to network issues. See Known Issues.

Release

Track Optimism release for e.g. : https://github.com/ethereum-optimism/optimism/compare/v1.0.6...v1.0.7

Our changes are mainly related to txmgr/txmgr.go rollup/derive/calldata_source.go

Merge conflicts should be resolved keeping mind the overall changes in the PR, i.e. any change that effects read / write of calldata should resolve frame pointers on ethereum to frame data from celestia.

Whenever there needs to be a release, our main branch celestia-develop should be rebased onto the latest tagged version of optimism develop

Tag

Whenever there’s a new release, a new tag should be created with <integration-version>-OP_<optimism-version>-CN_<celestia-node-version>

e.g the latest release at this time: **v0.1.0-OP_v1.0.6-CN_v0.10.0 tells us that we are at Optimism v1.0.6, Celestia Node v0.10.0 and integration v0.1.0.

Future Scope

Misc:

  • Migrate away from block height + tx index to transaction commitment (maybe hash) See #127
  • Use QGB to verify that the data roots are committed See #128

Tests:

  • Mock DA integration test See #39
  • Serialization / Deserialization test See #125

Known Issues

As of latest release at this time v0.1.0-OP_v1.0.6-CN_v0.10.0 devnet is pretty much stable.

However there are few issues in testnet (make testnet-up):

  • L2 reorg: existing unsafe block does not match derived attributes from L1 since op-node reads the data optimistically and doesn’t throw any error when it fails to parse the frames or otherwise read blocks, it will re-org instead
    • this could be due to any error in the integration because of which reading / writing frames is failing including serialization and connection issues
  • max_subscriptions_per_client reached - happening on testnet (possibly because of batcher congestion due to L1 block time mismatch between ethereum and celestia)