-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(external-node): delete empty unsealed batch on EN initialization #3125
Conversation
)" This reverts commit bb5d147.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a similar situation reproduce on the main node (i.e., an unsealed batch w/o blocks / transactions)? Is it handled OK?
- AFAICT,
MempoolIO::initialize()
can insert an unsealed L1 batch viaensure_unsealed_l1_batch_exists
(as off-topic / bikeshedding, I think this method is too high-level to be placed in DAL). OTOH, we only reach theensure_unsealed_l1_batch_exists
call ifL1BatchParamsProvider::load_l1_batch_env()
above returnsSome(_)
, i.e. there is at least one L2 block persisted for the batch. So it doesn't look like theensure_unsealed_l1_batch_exists
may actually persist a batch without blocks. - In
MempoolIO::wait_for_new_batch_params()
, a new batch is inserted into the storage proactively. If a node crashes immediately afterwards, it looks like the situation can be reproduced.
I believe that is handled OK as main node has a slightly different mechanism: But I will write a test just to validate that everything is ok.
Ok, so the logic is a bit confusing and I should have left a comment but essentially this is a compatibility mechanism. Imagine that EN starts for the very first time after unsealed batches PR made it in - we have
Fair, this was a bit of dirty hack to be honest with you to fix stage which was in crash loop at the time. This entire logic is IMHO something that should've been a Rust migration script but AFAICT we don't have anything like that. Anyway, I'll try to refactor this in a separate PR.
As I said there is actually a fail-safe mechanism at the very top of the function that prevents this from happening. Unless you mean something else? |
@slowli I am merging this PR but thanks for the critical review and if you still have questions/concern please feel free to continue the discussion here. I will deliver extra tests/improvements in a follow-up PR. |
…3125) ## What ❔ This PR reverts #3088 as I have realized it is going to be very hard to make this fix work by going in that direction. Basically initializing with an empty unsealed batch causes a lot of issues and the existing state keeper/external IO flow heavily relies on us having at least one at the start to initialize correctly. Will leave more context in the comments. Feel free to review individual commits to not see revert changelog. ## Why ❔ This bug causes EN to panic ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zkstack dev fmt` and `zkstack dev lint`.
🤖 I have created a release *beep* *boop* --- ## [25.0.0](core-v24.29.0...core-v25.0.0) (2024-10-23) ### ⚠ BREAKING CHANGES * **contracts:** integrate protocol defense changes ([#2737](#2737)) ### Features * Add CoinMarketCap external API ([#2971](#2971)) ([c1cb30e](c1cb30e)) * **api:** Implement eth_maxPriorityFeePerGas ([#3135](#3135)) ([35e84cc](35e84cc)) * **api:** Make acceptable values cache lag configurable ([#3028](#3028)) ([6747529](6747529)) * **contracts:** integrate protocol defense changes ([#2737](#2737)) ([c60a348](c60a348)) * **external-node:** save protocol version before opening a batch ([#3136](#3136)) ([d6de4f4](d6de4f4)) * Prover e2e test ([#2975](#2975)) ([0edd796](0edd796)) * **prover:** Add min_provers and dry_run features. Improve metrics and test. ([#3129](#3129)) ([7c28964](7c28964)) * **tee_verifier:** speedup SQL query for new jobs ([#3133](#3133)) ([30ceee8](30ceee8)) * vm2 tracers can access storage ([#3114](#3114)) ([e466b52](e466b52)) * **vm:** Return compressed bytecodes from `push_transaction()` ([#3126](#3126)) ([37f209f](37f209f)) ### Bug Fixes * **call_tracer:** Flat call tracer fixes for blocks ([#3095](#3095)) ([30ddb29](30ddb29)) * **consensus:** preventing config update reverts ([#3148](#3148)) ([caee55f](caee55f)) * **en:** Return `SyncState` health check ([#3142](#3142)) ([abeee81](abeee81)) * **external-node:** delete empty unsealed batch on EN initialization ([#3125](#3125)) ([5d5214b](5d5214b)) * Fix counter metric type to be Counter. ([#3153](#3153)) ([08a3fe7](08a3fe7)) * **mempool:** minor mempool improvements ([#3113](#3113)) ([cd16083](cd16083)) * **prover:** Run for zero queue to allow scaling down to 0 ([#3115](#3115)) ([bbe1919](bbe1919)) * restore instruction count functionality ([#3081](#3081)) ([6159f75](6159f75)) * **state-keeper:** save call trace for upgrade txs ([#3132](#3132)) ([e1c363f](e1c363f)) * **tee_prover:** add zstd compression ([#3144](#3144)) ([7241ae1](7241ae1)) * **tee_verifier:** correctly initialize storage for re-execution ([#3017](#3017)) ([9d88373](9d88373)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <[email protected]>
What ❔
This PR reverts #3088 as I have realized it is going to be very hard to make this fix work by going in that direction. Basically initializing with an empty unsealed batch causes a lot of issues and the existing state keeper/external IO flow heavily relies on us having at least one at the start to initialize correctly. Will leave more context in the comments.
Feel free to review individual commits to not see revert changelog.
Why ❔
This bug causes EN to panic
Checklist
zkstack dev fmt
andzkstack dev lint
.