Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prover): Disallow state changes from successful #2233

Merged
merged 5 commits into from
Jun 13, 2024

Conversation

EmilLuta
Copy link
Contributor

This PR is done as a fix for boojnet outage.

TL;DR; of outage -- race condition caused by prover jobs moving from 'successfulstate toin_progress/in_gpu_proving`.

The PR addresses:

  • no job can move from successful state (considered final state)
  • fix local development (contracts were pointing to 0.24.0 instead of 0.24.1) -- can be split to a different PR, if this is problematic.
  • add table constraint -- again, can be split in different PR
  • add checks for recursion_tip number of jobs (post outage check, should not happen ever, but better to verify)

EmilLuta added 2 commits June 13, 2024 15:04
This PR is done as a fix for boojnet outage.

TL;DR; of outage -- race condition caused by prover jobs moving from
'successful` state to `in_progress`/`in_gpu_proving`.

The PR addresses:
- no job can move from successful state (considered final state)
- fix local development (contracts were pointing to 0.24.0 instead of
  0.24.1) -- can be split to a different PR, if this is problematic.
- add table constraint -- again, can be split in different PR
- add checks for recursion_tip number of jobs (post outage check, should
  not happen ever, but better to verify)
@EmilLuta EmilLuta marked this pull request as draft June 13, 2024 13:07
@EmilLuta EmilLuta changed the title fix(prover): Disallow state changes from successful fix: Disallow state changes from successful Jun 13, 2024
@EmilLuta EmilLuta changed the title fix: Disallow state changes from successful fix(prover): Disallow state changes from successful Jun 13, 2024
@EmilLuta EmilLuta marked this pull request as ready for review June 13, 2024 14:42
@perekopskiy perekopskiy changed the title fix(prover): Disallow state changes from successful fix(prover): Disallow state changes from successful Jun 13, 2024
@EmilLuta EmilLuta added this pull request to the merge queue Jun 13, 2024
Merged via the queue into main with commit 2488a76 Jun 13, 2024
55 checks passed
@EmilLuta EmilLuta deleted the evl-post-outage-fixes branch June 13, 2024 16:24
i64::from(block_number.0)
i64::from(block_number.0),
ProofCompressionJobStatus::Successful.to_string(),
ProofCompressionJobStatus::SentToServer.to_string(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be worth it to log such occurances. This would make it easier to debug potential additional occurances.
You can make update return the number of affected rows and then compare with 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but we can already tell this by number of attempts in DB. Not opposing the idea.

github-merge-queue bot pushed a commit that referenced this pull request Jun 14, 2024
🤖 I have created a release *beep* *boop*
---


##
[15.0.0](prover-v14.5.0...prover-v15.0.0)
(2024-06-14)


### ⚠ BREAKING CHANGES

* updated boojum and nightly rust compiler
([#2126](#2126))

### Features

* added debug_proof to prover_cli
([#2052](#2052))
([b1ad01b](b1ad01b))
* faster & cleaner VK generation
([#2084](#2084))
([89c8cac](89c8cac))
* **node:** Move some stuff around
([#2151](#2151))
([bad5a6c](bad5a6c))
* **object-store:** Allow caching object store objects locally
([#2153](#2153))
([6c6e65c](6c6e65c))
* **proof_data_handler:** add new endpoints to the TEE prover interface
API ([#1993](#1993))
([eca98cc](eca98cc))
* **prover:** Add file based config for fri prover gateway
([#2150](#2150))
([81ffc6a](81ffc6a))
* **prover:** file based configs for witness generator
([#2161](#2161))
([24b8f93](24b8f93))
* support debugging of recursive circuits in prover_cli
([#2217](#2217))
([7d2e12d](7d2e12d))
* updated boojum and nightly rust compiler
([#2126](#2126))
([9e39f13](9e39f13))
* verification of L1Batch witness (BFT-471) - attempt 2
([#2232](#2232))
([dbcf3c6](dbcf3c6))
* verification of L1Batch witness (BFT-471)
([#2019](#2019))
([6cc5455](6cc5455))


### Bug Fixes

* **config:** Split object stores
([#2187](#2187))
([9bcdabc](9bcdabc))
* **prover_cli:** Fix delete command
([#2119](#2119))
([214f981](214f981))
* **prover_cli:** Fix the issues with `home` path
([#2104](#2104))
([1e18af2](1e18af2))
* **prover:** config
([#2165](#2165))
([e5daf8e](e5daf8e))
* **prover:** Disallow state changes from successful
([#2233](#2233))
([2488a76](2488a76))
* Treat 502s and 503s as transient for GCS OS
([#2202](#2202))
([0a12c52](0a12c52))


### Reverts

* verification of L1Batch witness (BFT-471)
([#2230](#2230))
([227e101](227e101))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
github-merge-queue bot pushed a commit that referenced this pull request Jun 24, 2024
🤖 I have created a release *beep* *boop*
---


##
[24.8.0](core-v24.7.0...core-v24.8.0)
(2024-06-24)


### ⚠ BREAKING CHANGES

* updated boojum and nightly rust compiler
([#2126](#2126))

### Features

* Add metrics for transaction execution result in state keeper
([#2021](#2021))
([dde0fc4](dde0fc4))
* **api:** Add new `l1_committed` block tag
([#2282](#2282))
([d5e8e9b](d5e8e9b))
* **api:** Rework zks_getProtocolVersion
([#2146](#2146))
([800b8f4](800b8f4))
* change `zkSync` occurences to `ZKsync`
([#2227](#2227))
([0b4104d](0b4104d))
* **contract-verifier:** Adjust contract verifier for zksolc 1.5.0
([#2255](#2255))
([63efb2e](63efb2e))
* **docs:** Add documentation for subset of wiring layer
implementations, used by Main node
([#2292](#2292))
([06c287b](06c287b))
* **docs:** Pruning and Snapshots recovery basic docs
([#2265](#2265))
([619a525](619a525))
* **en:** Allow recovery from specific snapshot
([#2137](#2137))
([ac61fed](ac61fed))
* **eth-sender:** fix for missing eth_txs_history entries
([#2236](#2236))
([f05b0ae](f05b0ae))
* Expose fair_pubdata_price for blocks and batches
([#2244](#2244))
([0d51cd6](0d51cd6))
* **merkle-tree:** Rework tree rollback
([#2207](#2207))
([c3b9c38](c3b9c38))
* **node-framework:** Add Main Node Client layer
([#2132](#2132))
([927d842](927d842))
* **node:** Move some stuff around
([#2151](#2151))
([bad5a6c](bad5a6c))
* **node:** Port (most of) Node to the Node Framework
([#2196](#2196))
([7842bc4](7842bc4))
* **object-store:** Allow caching object store objects locally
([#2153](#2153))
([6c6e65c](6c6e65c))
* **proof_data_handler:** add new endpoints to the TEE prover interface
API ([#1993](#1993))
([eca98cc](eca98cc))
* **prover:** Add file based config for fri prover gateway
([#2150](#2150))
([81ffc6a](81ffc6a))
* Remove initialize_components function
([#2284](#2284))
([0a38891](0a38891))
* **state-keeper:** Add metric for l2 block seal reason
([#2229](#2229))
([f967e6d](f967e6d))
* **state-keeper:** More state keeper metrics
([#2224](#2224))
([1e48cd9](1e48cd9))
* **sync-layer:** adapt MiniMerkleTree to manage priority queue
([#2068](#2068))
([3e72364](3e72364))
* **tee_verifier_input_producer:** use
`FactoryDepsDal::get_factory_deps()
([#2271](#2271))
([2c0a00a](2c0a00a))
* **toolbox:** add zk_toolbox ci
([#1985](#1985))
([4ab4922](4ab4922))
* updated boojum and nightly rust compiler
([#2126](#2126))
([9e39f13](9e39f13))
* upgraded encoding of transactions in consensus Payload.
([#2245](#2245))
([cb6a6c8](cb6a6c8))
* Use info log level for crates named zksync_* by default
([#2296](#2296))
([9303142](9303142))
* verification of L1Batch witness (BFT-471) - attempt 2
([#2232](#2232))
([dbcf3c6](dbcf3c6))
* verification of L1Batch witness (BFT-471)
([#2019](#2019))
([6cc5455](6cc5455))
* **vm-runner:** add basic metrics
([#2203](#2203))
([dd154f3](dd154f3))
* **vm-runner:** add protective reads persistence flag for state keeper
([#2307](#2307))
([36d2eb6](36d2eb6))
* **vm-runner:** shadow protective reads using VM runner
([#2017](#2017))
([1402dd0](1402dd0))


### Bug Fixes

* **api:** Fix getting pending block
([#2186](#2186))
([93315ba](93315ba))
* **api:** Fix transaction methods for pruned transactions
([#2168](#2168))
([00c4cca](00c4cca))
* **config:** Fix object store
([#2183](#2183))
([551cdc2](551cdc2))
* **config:** Split object stores
([#2187](#2187))
([9bcdabc](9bcdabc))
* **db:** Fix `insert_proof_generation_details()`
([#2291](#2291))
([c2412cf](c2412cf))
* **db:** Optimize `get_l2_blocks_to_execute_for_l1_batch`
([#2199](#2199))
([06ec5f3](06ec5f3))
* **en:** Fix reorg detection in presence of tree data fetcher
([#2197](#2197))
([20da566](20da566))
* **en:** Fix transient error detection in consistency checker
([#2140](#2140))
([38fdfe0](38fdfe0))
* **en:** Remove L1 client health check
([#2136](#2136))
([49198f6](49198f6))
* **eth-sender:** Don't resend already sent transactions in the same
block ([#2208](#2208))
([3538e9c](3538e9c))
* **eth-sender:** etter error handling in eth-sender
([#2163](#2163))
([0cad504](0cad504))
* **node_framework:** Run gas adjuster task only if necessary
([#2266](#2266))
([2dac846](2dac846))
* **object-store:** Consider more GCS errors transient
([#2246](#2246))
([2f6cd41](2f6cd41))
* **prover_cli:** Remove outdated fix for circuit id in node wg
([#2248](#2248))
([db8e71b](db8e71b))
* **prover:** Disallow state changes from successful
([#2233](#2233))
([2488a76](2488a76))
* **pruning:** Check pruning in metadata calculator
([#2286](#2286))
([7bd8f27](7bd8f27))
* Treat 502s and 503s as transient for GCS OS
([#2202](#2202))
([0a12c52](0a12c52))
* **vm-runner:** add config value for the first processed batch
([#2158](#2158))
([f666717](f666717))
* **vm-runner:** make `last_ready_batch` account for
`first_processed_batch`
([#2238](#2238))
([3889794](3889794))
* **vm:** fix insertion to `decommitted_code_hashes`
([#2275](#2275))
([15bb71e](15bb71e))
* **vm:** Update `decommitted_code_hashes` in `prepare_to_decommit`
([#2253](#2253))
([6c49a50](6c49a50))


### Performance Improvements

* **db:** Improve storage switching for state keeper cache
([#2234](#2234))
([7c8e24c](7c8e24c))
* **db:** Try yet another storage log pruning approach
([#2268](#2268))
([3ee34be](3ee34be))
* **en:** Parallelize persistence and chunk processing during tree
recovery
([#2050](#2050))
([b08a667](b08a667))
* **pruning:** Use more efficient query to delete past storage logs
([#2179](#2179))
([4c18755](4c18755))


### Reverts

* **pruning:** Revert pruning query
([#2220](#2220))
([8427cdd](8427cdd))
* verification of L1Batch witness (BFT-471)
([#2230](#2230))
([227e101](227e101))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
irnb pushed a commit to vianetwork/via-server that referenced this pull request Jul 12, 2024
🤖 I have created a release *beep* *boop*
---


##
[24.8.0](matter-labs/zksync-era@core-v24.7.0...core-v24.8.0)
(2024-06-24)


### ⚠ BREAKING CHANGES

* updated boojum and nightly rust compiler
([matter-labs#2126](matter-labs#2126))

### Features

* Add metrics for transaction execution result in state keeper
([matter-labs#2021](matter-labs#2021))
([dde0fc4](matter-labs@dde0fc4))
* **api:** Add new `l1_committed` block tag
([matter-labs#2282](matter-labs#2282))
([d5e8e9b](matter-labs@d5e8e9b))
* **api:** Rework zks_getProtocolVersion
([matter-labs#2146](matter-labs#2146))
([800b8f4](matter-labs@800b8f4))
* change `zkSync` occurences to `ZKsync`
([matter-labs#2227](matter-labs#2227))
([0b4104d](matter-labs@0b4104d))
* **contract-verifier:** Adjust contract verifier for zksolc 1.5.0
([matter-labs#2255](matter-labs#2255))
([63efb2e](matter-labs@63efb2e))
* **docs:** Add documentation for subset of wiring layer
implementations, used by Main node
([matter-labs#2292](matter-labs#2292))
([06c287b](matter-labs@06c287b))
* **docs:** Pruning and Snapshots recovery basic docs
([matter-labs#2265](matter-labs#2265))
([619a525](matter-labs@619a525))
* **en:** Allow recovery from specific snapshot
([matter-labs#2137](matter-labs#2137))
([ac61fed](matter-labs@ac61fed))
* **eth-sender:** fix for missing eth_txs_history entries
([matter-labs#2236](matter-labs#2236))
([f05b0ae](matter-labs@f05b0ae))
* Expose fair_pubdata_price for blocks and batches
([matter-labs#2244](matter-labs#2244))
([0d51cd6](matter-labs@0d51cd6))
* **merkle-tree:** Rework tree rollback
([matter-labs#2207](matter-labs#2207))
([c3b9c38](matter-labs@c3b9c38))
* **node-framework:** Add Main Node Client layer
([matter-labs#2132](matter-labs#2132))
([927d842](matter-labs@927d842))
* **node:** Move some stuff around
([matter-labs#2151](matter-labs#2151))
([bad5a6c](matter-labs@bad5a6c))
* **node:** Port (most of) Node to the Node Framework
([matter-labs#2196](matter-labs#2196))
([7842bc4](matter-labs@7842bc4))
* **object-store:** Allow caching object store objects locally
([matter-labs#2153](matter-labs#2153))
([6c6e65c](matter-labs@6c6e65c))
* **proof_data_handler:** add new endpoints to the TEE prover interface
API ([matter-labs#1993](matter-labs#1993))
([eca98cc](matter-labs@eca98cc))
* **prover:** Add file based config for fri prover gateway
([matter-labs#2150](matter-labs#2150))
([81ffc6a](matter-labs@81ffc6a))
* Remove initialize_components function
([matter-labs#2284](matter-labs#2284))
([0a38891](matter-labs@0a38891))
* **state-keeper:** Add metric for l2 block seal reason
([matter-labs#2229](matter-labs#2229))
([f967e6d](matter-labs@f967e6d))
* **state-keeper:** More state keeper metrics
([matter-labs#2224](matter-labs#2224))
([1e48cd9](matter-labs@1e48cd9))
* **sync-layer:** adapt MiniMerkleTree to manage priority queue
([matter-labs#2068](matter-labs#2068))
([3e72364](matter-labs@3e72364))
* **tee_verifier_input_producer:** use
`FactoryDepsDal::get_factory_deps()
([matter-labs#2271](matter-labs#2271))
([2c0a00a](matter-labs@2c0a00a))
* **toolbox:** add zk_toolbox ci
([matter-labs#1985](matter-labs#1985))
([4ab4922](matter-labs@4ab4922))
* updated boojum and nightly rust compiler
([matter-labs#2126](matter-labs#2126))
([9e39f13](matter-labs@9e39f13))
* upgraded encoding of transactions in consensus Payload.
([matter-labs#2245](matter-labs#2245))
([cb6a6c8](matter-labs@cb6a6c8))
* Use info log level for crates named zksync_* by default
([matter-labs#2296](matter-labs#2296))
([9303142](matter-labs@9303142))
* verification of L1Batch witness (BFT-471) - attempt 2
([matter-labs#2232](matter-labs#2232))
([dbcf3c6](matter-labs@dbcf3c6))
* verification of L1Batch witness (BFT-471)
([matter-labs#2019](matter-labs#2019))
([6cc5455](matter-labs@6cc5455))
* **vm-runner:** add basic metrics
([matter-labs#2203](matter-labs#2203))
([dd154f3](matter-labs@dd154f3))
* **vm-runner:** add protective reads persistence flag for state keeper
([matter-labs#2307](matter-labs#2307))
([36d2eb6](matter-labs@36d2eb6))
* **vm-runner:** shadow protective reads using VM runner
([matter-labs#2017](matter-labs#2017))
([1402dd0](matter-labs@1402dd0))


### Bug Fixes

* **api:** Fix getting pending block
([matter-labs#2186](matter-labs#2186))
([93315ba](matter-labs@93315ba))
* **api:** Fix transaction methods for pruned transactions
([matter-labs#2168](matter-labs#2168))
([00c4cca](matter-labs@00c4cca))
* **config:** Fix object store
([matter-labs#2183](matter-labs#2183))
([551cdc2](matter-labs@551cdc2))
* **config:** Split object stores
([matter-labs#2187](matter-labs#2187))
([9bcdabc](matter-labs@9bcdabc))
* **db:** Fix `insert_proof_generation_details()`
([matter-labs#2291](matter-labs#2291))
([c2412cf](matter-labs@c2412cf))
* **db:** Optimize `get_l2_blocks_to_execute_for_l1_batch`
([matter-labs#2199](matter-labs#2199))
([06ec5f3](matter-labs@06ec5f3))
* **en:** Fix reorg detection in presence of tree data fetcher
([matter-labs#2197](matter-labs#2197))
([20da566](matter-labs@20da566))
* **en:** Fix transient error detection in consistency checker
([matter-labs#2140](matter-labs#2140))
([38fdfe0](matter-labs@38fdfe0))
* **en:** Remove L1 client health check
([matter-labs#2136](matter-labs#2136))
([49198f6](matter-labs@49198f6))
* **eth-sender:** Don't resend already sent transactions in the same
block ([matter-labs#2208](matter-labs#2208))
([3538e9c](matter-labs@3538e9c))
* **eth-sender:** etter error handling in eth-sender
([matter-labs#2163](matter-labs#2163))
([0cad504](matter-labs@0cad504))
* **node_framework:** Run gas adjuster task only if necessary
([matter-labs#2266](matter-labs#2266))
([2dac846](matter-labs@2dac846))
* **object-store:** Consider more GCS errors transient
([matter-labs#2246](matter-labs#2246))
([2f6cd41](matter-labs@2f6cd41))
* **prover_cli:** Remove outdated fix for circuit id in node wg
([matter-labs#2248](matter-labs#2248))
([db8e71b](matter-labs@db8e71b))
* **prover:** Disallow state changes from successful
([matter-labs#2233](matter-labs#2233))
([2488a76](matter-labs@2488a76))
* **pruning:** Check pruning in metadata calculator
([matter-labs#2286](matter-labs#2286))
([7bd8f27](matter-labs@7bd8f27))
* Treat 502s and 503s as transient for GCS OS
([matter-labs#2202](matter-labs#2202))
([0a12c52](matter-labs@0a12c52))
* **vm-runner:** add config value for the first processed batch
([matter-labs#2158](matter-labs#2158))
([f666717](matter-labs@f666717))
* **vm-runner:** make `last_ready_batch` account for
`first_processed_batch`
([matter-labs#2238](matter-labs#2238))
([3889794](matter-labs@3889794))
* **vm:** fix insertion to `decommitted_code_hashes`
([matter-labs#2275](matter-labs#2275))
([15bb71e](matter-labs@15bb71e))
* **vm:** Update `decommitted_code_hashes` in `prepare_to_decommit`
([matter-labs#2253](matter-labs#2253))
([6c49a50](matter-labs@6c49a50))


### Performance Improvements

* **db:** Improve storage switching for state keeper cache
([matter-labs#2234](matter-labs#2234))
([7c8e24c](matter-labs@7c8e24c))
* **db:** Try yet another storage log pruning approach
([matter-labs#2268](matter-labs#2268))
([3ee34be](matter-labs@3ee34be))
* **en:** Parallelize persistence and chunk processing during tree
recovery
([matter-labs#2050](matter-labs#2050))
([b08a667](matter-labs@b08a667))
* **pruning:** Use more efficient query to delete past storage logs
([matter-labs#2179](matter-labs#2179))
([4c18755](matter-labs@4c18755))


### Reverts

* **pruning:** Revert pruning query
([matter-labs#2220](matter-labs#2220))
([8427cdd](matter-labs@8427cdd))
* verification of L1Batch witness (BFT-471)
([matter-labs#2230](matter-labs#2230))
([227e101](matter-labs@227e101))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants