Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Treat 502s and 503s as transient for GCS OS #2202

Merged
merged 2 commits into from
Jun 11, 2024
Merged

Conversation

EmilLuta
Copy link
Contributor

A refactoring introduced lately caused multiple restarts in provers (namely BWGs) when GCS was unavailable (502 or 503). This is a sporadic, once in a while, but still invalides tens of minutes of work and makes proving fickle and slow. This PR addresses the issue and restores old behavior pre-refactoring, treating 502s and 503s as transient errors.

A refactoring introduced lately caused multiple restarts in provers
(namely BWGs) when GCS was unavailable (502 or 503). This is a sporadic,
once in a while, but still invalides tens of minutes of work and makes
proving fickle and slow. This PR addresses the issue and restores old
behavior pre-refactoring.
@EmilLuta EmilLuta requested review from slowli and popzxc June 11, 2024 08:42
slowli
slowli previously approved these changes Jun 11, 2024
@EmilLuta EmilLuta enabled auto-merge June 11, 2024 11:21
@EmilLuta EmilLuta added this pull request to the merge queue Jun 11, 2024
Merged via the queue into main with commit 0a12c52 Jun 11, 2024
55 checks passed
@EmilLuta EmilLuta deleted the evl-fix-retry-for-gcs branch June 11, 2024 16:34
github-merge-queue bot pushed a commit that referenced this pull request Jun 14, 2024
🤖 I have created a release *beep* *boop*
---


##
[15.0.0](prover-v14.5.0...prover-v15.0.0)
(2024-06-14)


### ⚠ BREAKING CHANGES

* updated boojum and nightly rust compiler
([#2126](#2126))

### Features

* added debug_proof to prover_cli
([#2052](#2052))
([b1ad01b](b1ad01b))
* faster & cleaner VK generation
([#2084](#2084))
([89c8cac](89c8cac))
* **node:** Move some stuff around
([#2151](#2151))
([bad5a6c](bad5a6c))
* **object-store:** Allow caching object store objects locally
([#2153](#2153))
([6c6e65c](6c6e65c))
* **proof_data_handler:** add new endpoints to the TEE prover interface
API ([#1993](#1993))
([eca98cc](eca98cc))
* **prover:** Add file based config for fri prover gateway
([#2150](#2150))
([81ffc6a](81ffc6a))
* **prover:** file based configs for witness generator
([#2161](#2161))
([24b8f93](24b8f93))
* support debugging of recursive circuits in prover_cli
([#2217](#2217))
([7d2e12d](7d2e12d))
* updated boojum and nightly rust compiler
([#2126](#2126))
([9e39f13](9e39f13))
* verification of L1Batch witness (BFT-471) - attempt 2
([#2232](#2232))
([dbcf3c6](dbcf3c6))
* verification of L1Batch witness (BFT-471)
([#2019](#2019))
([6cc5455](6cc5455))


### Bug Fixes

* **config:** Split object stores
([#2187](#2187))
([9bcdabc](9bcdabc))
* **prover_cli:** Fix delete command
([#2119](#2119))
([214f981](214f981))
* **prover_cli:** Fix the issues with `home` path
([#2104](#2104))
([1e18af2](1e18af2))
* **prover:** config
([#2165](#2165))
([e5daf8e](e5daf8e))
* **prover:** Disallow state changes from successful
([#2233](#2233))
([2488a76](2488a76))
* Treat 502s and 503s as transient for GCS OS
([#2202](#2202))
([0a12c52](0a12c52))


### Reverts

* verification of L1Batch witness (BFT-471)
([#2230](#2230))
([227e101](227e101))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
github-merge-queue bot pushed a commit that referenced this pull request Jun 24, 2024
🤖 I have created a release *beep* *boop*
---


##
[24.8.0](core-v24.7.0...core-v24.8.0)
(2024-06-24)


### ⚠ BREAKING CHANGES

* updated boojum and nightly rust compiler
([#2126](#2126))

### Features

* Add metrics for transaction execution result in state keeper
([#2021](#2021))
([dde0fc4](dde0fc4))
* **api:** Add new `l1_committed` block tag
([#2282](#2282))
([d5e8e9b](d5e8e9b))
* **api:** Rework zks_getProtocolVersion
([#2146](#2146))
([800b8f4](800b8f4))
* change `zkSync` occurences to `ZKsync`
([#2227](#2227))
([0b4104d](0b4104d))
* **contract-verifier:** Adjust contract verifier for zksolc 1.5.0
([#2255](#2255))
([63efb2e](63efb2e))
* **docs:** Add documentation for subset of wiring layer
implementations, used by Main node
([#2292](#2292))
([06c287b](06c287b))
* **docs:** Pruning and Snapshots recovery basic docs
([#2265](#2265))
([619a525](619a525))
* **en:** Allow recovery from specific snapshot
([#2137](#2137))
([ac61fed](ac61fed))
* **eth-sender:** fix for missing eth_txs_history entries
([#2236](#2236))
([f05b0ae](f05b0ae))
* Expose fair_pubdata_price for blocks and batches
([#2244](#2244))
([0d51cd6](0d51cd6))
* **merkle-tree:** Rework tree rollback
([#2207](#2207))
([c3b9c38](c3b9c38))
* **node-framework:** Add Main Node Client layer
([#2132](#2132))
([927d842](927d842))
* **node:** Move some stuff around
([#2151](#2151))
([bad5a6c](bad5a6c))
* **node:** Port (most of) Node to the Node Framework
([#2196](#2196))
([7842bc4](7842bc4))
* **object-store:** Allow caching object store objects locally
([#2153](#2153))
([6c6e65c](6c6e65c))
* **proof_data_handler:** add new endpoints to the TEE prover interface
API ([#1993](#1993))
([eca98cc](eca98cc))
* **prover:** Add file based config for fri prover gateway
([#2150](#2150))
([81ffc6a](81ffc6a))
* Remove initialize_components function
([#2284](#2284))
([0a38891](0a38891))
* **state-keeper:** Add metric for l2 block seal reason
([#2229](#2229))
([f967e6d](f967e6d))
* **state-keeper:** More state keeper metrics
([#2224](#2224))
([1e48cd9](1e48cd9))
* **sync-layer:** adapt MiniMerkleTree to manage priority queue
([#2068](#2068))
([3e72364](3e72364))
* **tee_verifier_input_producer:** use
`FactoryDepsDal::get_factory_deps()
([#2271](#2271))
([2c0a00a](2c0a00a))
* **toolbox:** add zk_toolbox ci
([#1985](#1985))
([4ab4922](4ab4922))
* updated boojum and nightly rust compiler
([#2126](#2126))
([9e39f13](9e39f13))
* upgraded encoding of transactions in consensus Payload.
([#2245](#2245))
([cb6a6c8](cb6a6c8))
* Use info log level for crates named zksync_* by default
([#2296](#2296))
([9303142](9303142))
* verification of L1Batch witness (BFT-471) - attempt 2
([#2232](#2232))
([dbcf3c6](dbcf3c6))
* verification of L1Batch witness (BFT-471)
([#2019](#2019))
([6cc5455](6cc5455))
* **vm-runner:** add basic metrics
([#2203](#2203))
([dd154f3](dd154f3))
* **vm-runner:** add protective reads persistence flag for state keeper
([#2307](#2307))
([36d2eb6](36d2eb6))
* **vm-runner:** shadow protective reads using VM runner
([#2017](#2017))
([1402dd0](1402dd0))


### Bug Fixes

* **api:** Fix getting pending block
([#2186](#2186))
([93315ba](93315ba))
* **api:** Fix transaction methods for pruned transactions
([#2168](#2168))
([00c4cca](00c4cca))
* **config:** Fix object store
([#2183](#2183))
([551cdc2](551cdc2))
* **config:** Split object stores
([#2187](#2187))
([9bcdabc](9bcdabc))
* **db:** Fix `insert_proof_generation_details()`
([#2291](#2291))
([c2412cf](c2412cf))
* **db:** Optimize `get_l2_blocks_to_execute_for_l1_batch`
([#2199](#2199))
([06ec5f3](06ec5f3))
* **en:** Fix reorg detection in presence of tree data fetcher
([#2197](#2197))
([20da566](20da566))
* **en:** Fix transient error detection in consistency checker
([#2140](#2140))
([38fdfe0](38fdfe0))
* **en:** Remove L1 client health check
([#2136](#2136))
([49198f6](49198f6))
* **eth-sender:** Don't resend already sent transactions in the same
block ([#2208](#2208))
([3538e9c](3538e9c))
* **eth-sender:** etter error handling in eth-sender
([#2163](#2163))
([0cad504](0cad504))
* **node_framework:** Run gas adjuster task only if necessary
([#2266](#2266))
([2dac846](2dac846))
* **object-store:** Consider more GCS errors transient
([#2246](#2246))
([2f6cd41](2f6cd41))
* **prover_cli:** Remove outdated fix for circuit id in node wg
([#2248](#2248))
([db8e71b](db8e71b))
* **prover:** Disallow state changes from successful
([#2233](#2233))
([2488a76](2488a76))
* **pruning:** Check pruning in metadata calculator
([#2286](#2286))
([7bd8f27](7bd8f27))
* Treat 502s and 503s as transient for GCS OS
([#2202](#2202))
([0a12c52](0a12c52))
* **vm-runner:** add config value for the first processed batch
([#2158](#2158))
([f666717](f666717))
* **vm-runner:** make `last_ready_batch` account for
`first_processed_batch`
([#2238](#2238))
([3889794](3889794))
* **vm:** fix insertion to `decommitted_code_hashes`
([#2275](#2275))
([15bb71e](15bb71e))
* **vm:** Update `decommitted_code_hashes` in `prepare_to_decommit`
([#2253](#2253))
([6c49a50](6c49a50))


### Performance Improvements

* **db:** Improve storage switching for state keeper cache
([#2234](#2234))
([7c8e24c](7c8e24c))
* **db:** Try yet another storage log pruning approach
([#2268](#2268))
([3ee34be](3ee34be))
* **en:** Parallelize persistence and chunk processing during tree
recovery
([#2050](#2050))
([b08a667](b08a667))
* **pruning:** Use more efficient query to delete past storage logs
([#2179](#2179))
([4c18755](4c18755))


### Reverts

* **pruning:** Revert pruning query
([#2220](#2220))
([8427cdd](8427cdd))
* verification of L1Batch witness (BFT-471)
([#2230](#2230))
([227e101](227e101))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
irnb pushed a commit to vianetwork/via-server that referenced this pull request Jul 12, 2024
🤖 I have created a release *beep* *boop*
---


##
[24.8.0](matter-labs/zksync-era@core-v24.7.0...core-v24.8.0)
(2024-06-24)


### ⚠ BREAKING CHANGES

* updated boojum and nightly rust compiler
([matter-labs#2126](matter-labs#2126))

### Features

* Add metrics for transaction execution result in state keeper
([matter-labs#2021](matter-labs#2021))
([dde0fc4](matter-labs@dde0fc4))
* **api:** Add new `l1_committed` block tag
([matter-labs#2282](matter-labs#2282))
([d5e8e9b](matter-labs@d5e8e9b))
* **api:** Rework zks_getProtocolVersion
([matter-labs#2146](matter-labs#2146))
([800b8f4](matter-labs@800b8f4))
* change `zkSync` occurences to `ZKsync`
([matter-labs#2227](matter-labs#2227))
([0b4104d](matter-labs@0b4104d))
* **contract-verifier:** Adjust contract verifier for zksolc 1.5.0
([matter-labs#2255](matter-labs#2255))
([63efb2e](matter-labs@63efb2e))
* **docs:** Add documentation for subset of wiring layer
implementations, used by Main node
([matter-labs#2292](matter-labs#2292))
([06c287b](matter-labs@06c287b))
* **docs:** Pruning and Snapshots recovery basic docs
([matter-labs#2265](matter-labs#2265))
([619a525](matter-labs@619a525))
* **en:** Allow recovery from specific snapshot
([matter-labs#2137](matter-labs#2137))
([ac61fed](matter-labs@ac61fed))
* **eth-sender:** fix for missing eth_txs_history entries
([matter-labs#2236](matter-labs#2236))
([f05b0ae](matter-labs@f05b0ae))
* Expose fair_pubdata_price for blocks and batches
([matter-labs#2244](matter-labs#2244))
([0d51cd6](matter-labs@0d51cd6))
* **merkle-tree:** Rework tree rollback
([matter-labs#2207](matter-labs#2207))
([c3b9c38](matter-labs@c3b9c38))
* **node-framework:** Add Main Node Client layer
([matter-labs#2132](matter-labs#2132))
([927d842](matter-labs@927d842))
* **node:** Move some stuff around
([matter-labs#2151](matter-labs#2151))
([bad5a6c](matter-labs@bad5a6c))
* **node:** Port (most of) Node to the Node Framework
([matter-labs#2196](matter-labs#2196))
([7842bc4](matter-labs@7842bc4))
* **object-store:** Allow caching object store objects locally
([matter-labs#2153](matter-labs#2153))
([6c6e65c](matter-labs@6c6e65c))
* **proof_data_handler:** add new endpoints to the TEE prover interface
API ([matter-labs#1993](matter-labs#1993))
([eca98cc](matter-labs@eca98cc))
* **prover:** Add file based config for fri prover gateway
([matter-labs#2150](matter-labs#2150))
([81ffc6a](matter-labs@81ffc6a))
* Remove initialize_components function
([matter-labs#2284](matter-labs#2284))
([0a38891](matter-labs@0a38891))
* **state-keeper:** Add metric for l2 block seal reason
([matter-labs#2229](matter-labs#2229))
([f967e6d](matter-labs@f967e6d))
* **state-keeper:** More state keeper metrics
([matter-labs#2224](matter-labs#2224))
([1e48cd9](matter-labs@1e48cd9))
* **sync-layer:** adapt MiniMerkleTree to manage priority queue
([matter-labs#2068](matter-labs#2068))
([3e72364](matter-labs@3e72364))
* **tee_verifier_input_producer:** use
`FactoryDepsDal::get_factory_deps()
([matter-labs#2271](matter-labs#2271))
([2c0a00a](matter-labs@2c0a00a))
* **toolbox:** add zk_toolbox ci
([matter-labs#1985](matter-labs#1985))
([4ab4922](matter-labs@4ab4922))
* updated boojum and nightly rust compiler
([matter-labs#2126](matter-labs#2126))
([9e39f13](matter-labs@9e39f13))
* upgraded encoding of transactions in consensus Payload.
([matter-labs#2245](matter-labs#2245))
([cb6a6c8](matter-labs@cb6a6c8))
* Use info log level for crates named zksync_* by default
([matter-labs#2296](matter-labs#2296))
([9303142](matter-labs@9303142))
* verification of L1Batch witness (BFT-471) - attempt 2
([matter-labs#2232](matter-labs#2232))
([dbcf3c6](matter-labs@dbcf3c6))
* verification of L1Batch witness (BFT-471)
([matter-labs#2019](matter-labs#2019))
([6cc5455](matter-labs@6cc5455))
* **vm-runner:** add basic metrics
([matter-labs#2203](matter-labs#2203))
([dd154f3](matter-labs@dd154f3))
* **vm-runner:** add protective reads persistence flag for state keeper
([matter-labs#2307](matter-labs#2307))
([36d2eb6](matter-labs@36d2eb6))
* **vm-runner:** shadow protective reads using VM runner
([matter-labs#2017](matter-labs#2017))
([1402dd0](matter-labs@1402dd0))


### Bug Fixes

* **api:** Fix getting pending block
([matter-labs#2186](matter-labs#2186))
([93315ba](matter-labs@93315ba))
* **api:** Fix transaction methods for pruned transactions
([matter-labs#2168](matter-labs#2168))
([00c4cca](matter-labs@00c4cca))
* **config:** Fix object store
([matter-labs#2183](matter-labs#2183))
([551cdc2](matter-labs@551cdc2))
* **config:** Split object stores
([matter-labs#2187](matter-labs#2187))
([9bcdabc](matter-labs@9bcdabc))
* **db:** Fix `insert_proof_generation_details()`
([matter-labs#2291](matter-labs#2291))
([c2412cf](matter-labs@c2412cf))
* **db:** Optimize `get_l2_blocks_to_execute_for_l1_batch`
([matter-labs#2199](matter-labs#2199))
([06ec5f3](matter-labs@06ec5f3))
* **en:** Fix reorg detection in presence of tree data fetcher
([matter-labs#2197](matter-labs#2197))
([20da566](matter-labs@20da566))
* **en:** Fix transient error detection in consistency checker
([matter-labs#2140](matter-labs#2140))
([38fdfe0](matter-labs@38fdfe0))
* **en:** Remove L1 client health check
([matter-labs#2136](matter-labs#2136))
([49198f6](matter-labs@49198f6))
* **eth-sender:** Don't resend already sent transactions in the same
block ([matter-labs#2208](matter-labs#2208))
([3538e9c](matter-labs@3538e9c))
* **eth-sender:** etter error handling in eth-sender
([matter-labs#2163](matter-labs#2163))
([0cad504](matter-labs@0cad504))
* **node_framework:** Run gas adjuster task only if necessary
([matter-labs#2266](matter-labs#2266))
([2dac846](matter-labs@2dac846))
* **object-store:** Consider more GCS errors transient
([matter-labs#2246](matter-labs#2246))
([2f6cd41](matter-labs@2f6cd41))
* **prover_cli:** Remove outdated fix for circuit id in node wg
([matter-labs#2248](matter-labs#2248))
([db8e71b](matter-labs@db8e71b))
* **prover:** Disallow state changes from successful
([matter-labs#2233](matter-labs#2233))
([2488a76](matter-labs@2488a76))
* **pruning:** Check pruning in metadata calculator
([matter-labs#2286](matter-labs#2286))
([7bd8f27](matter-labs@7bd8f27))
* Treat 502s and 503s as transient for GCS OS
([matter-labs#2202](matter-labs#2202))
([0a12c52](matter-labs@0a12c52))
* **vm-runner:** add config value for the first processed batch
([matter-labs#2158](matter-labs#2158))
([f666717](matter-labs@f666717))
* **vm-runner:** make `last_ready_batch` account for
`first_processed_batch`
([matter-labs#2238](matter-labs#2238))
([3889794](matter-labs@3889794))
* **vm:** fix insertion to `decommitted_code_hashes`
([matter-labs#2275](matter-labs#2275))
([15bb71e](matter-labs@15bb71e))
* **vm:** Update `decommitted_code_hashes` in `prepare_to_decommit`
([matter-labs#2253](matter-labs#2253))
([6c49a50](matter-labs@6c49a50))


### Performance Improvements

* **db:** Improve storage switching for state keeper cache
([matter-labs#2234](matter-labs#2234))
([7c8e24c](matter-labs@7c8e24c))
* **db:** Try yet another storage log pruning approach
([matter-labs#2268](matter-labs#2268))
([3ee34be](matter-labs@3ee34be))
* **en:** Parallelize persistence and chunk processing during tree
recovery
([matter-labs#2050](matter-labs#2050))
([b08a667](matter-labs@b08a667))
* **pruning:** Use more efficient query to delete past storage logs
([matter-labs#2179](matter-labs#2179))
([4c18755](matter-labs@4c18755))


### Reverts

* **pruning:** Revert pruning query
([matter-labs#2220](matter-labs#2220))
([8427cdd](matter-labs@8427cdd))
* verification of L1Batch witness (BFT-471)
([matter-labs#2230](matter-labs#2230))
([227e101](matter-labs@227e101))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants