Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(proof-data-handler): exclude batches without object file in GCS #2980

Merged

Conversation

pbeza
Copy link
Collaborator

@pbeza pbeza commented Sep 27, 2024

What ❔

/tee/proof_inputs endpoint no longer returns batches that have no corresponding object file in Google Cloud Storage for an extended period.

Why ❔

TEE's proof-data-handler on mainnet was flooded with warnings.

Since the recent mainnet's 24.25.0 redeployment, we've been flooded with warnings for the proof-data-handler on mainnet (the warnings are actually not fatal in this context):

Failed request with a fatal error

(...)

Blobs for batch numbers 490520 to 490555 not found in the object store. Marked as unpicked.

The issue is caused by the code behind the /tee/proof_inputs endpoint (which is equivalent to the /proof_generation_data endpoint) – it finds the next batch to send to the requesting tee-prover by looking for the first batch that has a corresponding object in the Google object store. As it skips over batches that don’t have the objects, it logs Failed request with a fatal error for each one (unless the skipped batch was successfully proven, in which case it doesn’t log the error). This happens with every request the tee-prover sends, which is why we're getting so much noise in the logs.

One possible solution is to flag the problematic batches as permanently_ignored, like Thomas did before on mainnet.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.

@pbeza pbeza force-pushed the tee/flag-old-batches-as-permanently-ignored-automatically branch 3 times, most recently from f1b8ad3 to 65cc26e Compare September 30, 2024 11:22
@pbeza pbeza marked this pull request as ready for review September 30, 2024 12:02
@pbeza
Copy link
Collaborator Author

pbeza commented Sep 30, 2024

@popzxc, I remember you mentioned not to ask for code reviews this wave, but you're probably the most familiar with this code (along with @slowli). So, if you could make an exception this time, I’d really appreciate it. If you're busy, no worries – feel free to ignore, and I’ll ask @RomanBrodetski to find someone else. Thanks!

@pbeza
Copy link
Collaborator Author

pbeza commented Oct 1, 2024

Kindly ping @slowli @RomanBrodetski. I need a reviewer.

core/lib/object_store/src/retries.rs Outdated Show resolved Hide resolved
core/lib/types/src/tee_types.rs Outdated Show resolved Hide resolved
core/lib/types/src/tee_types.rs Outdated Show resolved Hide resolved
core/node/proof_data_handler/src/tee_request_processor.rs Outdated Show resolved Hide resolved
core/node/proof_data_handler/src/tee_request_processor.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@RomanBrodetski RomanBrodetski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbeza to be honest I don't fully follow this solution. I understand what we are trying to do (mark older unresolved jobs as skipped), but I'm not sure I understand the Why here. We can discuss over a huddle or async

core/lib/types/src/tee_types.rs Outdated Show resolved Hide resolved
@pbeza
Copy link
Collaborator Author

pbeza commented Oct 8, 2024

JFYI: This PR is on hold because the code it is based on was recently radically redesigned/refactored here: #3017. This PR may be cherry-picked/revisited once #3017 is merged into main.

@pbeza pbeza force-pushed the tee/flag-old-batches-as-permanently-ignored-automatically branch 17 times, most recently from 4ee505b to bfeddc9 Compare October 31, 2024 18:29
pbeza added 16 commits November 18, 2024 12:49
We already depend on zksync_types, which re-exports a substantial part
of basic types.
@pbeza
Copy link
Collaborator Author

pbeza commented Nov 20, 2024

@slowli @haraldh I just retested everything manually to ensure that transitioning to the permanently_ignored state works correctly when there are no object files (you can't imagine how time-consuming and painful it is to reproduce). Everything seems to be working perfectly. Ready to merge from my perspective. 🫡

@pbeza pbeza requested review from slowli and haraldh November 20, 2024 16:59
@haraldh haraldh added this pull request to the merge queue Nov 21, 2024
Merged via the queue into main with commit 3e309e0 Nov 21, 2024
43 checks passed
@haraldh haraldh deleted the tee/flag-old-batches-as-permanently-ignored-automatically branch November 21, 2024 08:37
github-merge-queue bot pushed a commit that referenced this pull request Dec 11, 2024
🤖 I have created a release *beep* *boop*
---


##
[25.3.0](core-v25.2.0...core-v25.3.0)
(2024-12-11)


### Features

* change seal criteria for gateway
([#3320](#3320))
([a0a74aa](a0a74aa))
* **contract-verifier:** Download compilers from GH automatically
([#3291](#3291))
([a10c4ba](a10c4ba))
* integrate gateway changes for some components
([#3274](#3274))
([cbc91e3](cbc91e3))
* **proof-data-handler:** exclude batches without object file in GCS
([#2980](#2980))
([3e309e0](3e309e0))
* **pruning:** Record L1 batch root hash in pruning logs
([#3266](#3266))
([7b6e590](7b6e590))
* **state-keeper:** mempool io opens batch if there is protocol upgrade
tx ([#3360](#3360))
([f6422cd](f6422cd))
* **tee:** add error handling for unstable_getTeeProofs API endpoint
([#3321](#3321))
([26f630c](26f630c))
* **zksync_cli:** Health checkpoint improvements
([#3193](#3193))
([440fe8d](440fe8d))


### Bug Fixes

* **api:** batch fee input scaling for `debug_traceCall`
([#3344](#3344))
([7ace594](7ace594))
* **tee:** correct previous fix for race condition in batch locking
([#3358](#3358))
([b12da8d](b12da8d))
* **tee:** fix race condition in batch locking
([#3342](#3342))
([a7dc0ed](a7dc0ed))
* **tracer:** adds vm error to flatCallTracer error field if exists
([#3374](#3374))
([5d77727](5d77727))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
gianbelinche added a commit to lambdaclass/zksync-era that referenced this pull request Jan 3, 2025
…vars (#371)

* feat(state-keeper): mempool io opens batch if there is protocol upgrade tx (matter-labs#3360)

## What ❔

Mempool io opens batch if there is protocol upgrade tx

## Why ❔

Currently if mempool is empty but there is protocol upgrade tx, then
batch is not opened

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [ ] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [ ] Documentation comments have been added / updated.
- [ ] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.

* fix: Fixed cargo deny (matter-labs#3372)

## What ❔

Fixes cargo deny CI fail.

* docs: interop docs update (matter-labs#3366)

## What ❔

<!-- What are the changes this PR brings about? -->
<!-- Example: This PR adds a PR template to the repo. -->
<!-- (For bigger PRs adding more context is appreciated) -->

## Why ❔

<!-- Why are these changes done? What goal do they contribute to? What
are the principles behind them? -->
<!-- Example: PR templates ensure PR reviewers, observers, and future
iterators are in context about the evolution of repos. -->

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [ ] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [ ] Documentation comments have been added / updated.
- [ ] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.

* fix(tracer): adds vm error to flatCallTracer error field if exists (matter-labs#3374)

## What ❔

<!-- What are the changes this PR brings about? -->
<!-- Example: This PR adds a PR template to the repo. -->
<!-- (For bigger PRs adding more context is appreciated) -->
- Updates `flatCallTracer` error to include vm error if it exists 

## Why ❔

<!-- Why are these changes done? What goal do they contribute to? What
are the principles behind them? -->
<!-- Example: PR templates ensure PR reviewers, observers, and future
iterators are in context about the evolution of repos. -->
- MM has requested that if an error exists we should populate within
`flatCallTracer` as this is what others do, prior to this PR it was only
revert_reason introduced here:
matter-labs#3306. However, if we have
a vm error the error field is not populated as seen in this tx:
`0x6c85bf34666dcdaa885f2bc6e95186029d2b25f2a3bbdff21c36878e2d4a19ed`
which failed due to a vm panic.

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [x] Documentation comments have been added / updated.
- [x] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.

* chore(main): release core 25.3.0 (matter-labs#3313)

:robot: I have created a release *beep* *boop*
---


##
[25.3.0](matter-labs/zksync-era@core-v25.2.0...core-v25.3.0)
(2024-12-11)


### Features

* change seal criteria for gateway
([matter-labs#3320](matter-labs#3320))
([a0a74aa](matter-labs@a0a74aa))
* **contract-verifier:** Download compilers from GH automatically
([matter-labs#3291](matter-labs#3291))
([a10c4ba](matter-labs@a10c4ba))
* integrate gateway changes for some components
([matter-labs#3274](matter-labs#3274))
([cbc91e3](matter-labs@cbc91e3))
* **proof-data-handler:** exclude batches without object file in GCS
([matter-labs#2980](matter-labs#2980))
([3e309e0](matter-labs@3e309e0))
* **pruning:** Record L1 batch root hash in pruning logs
([matter-labs#3266](matter-labs#3266))
([7b6e590](matter-labs@7b6e590))
* **state-keeper:** mempool io opens batch if there is protocol upgrade
tx ([matter-labs#3360](matter-labs#3360))
([f6422cd](matter-labs@f6422cd))
* **tee:** add error handling for unstable_getTeeProofs API endpoint
([matter-labs#3321](matter-labs#3321))
([26f630c](matter-labs@26f630c))
* **zksync_cli:** Health checkpoint improvements
([matter-labs#3193](matter-labs#3193))
([440fe8d](matter-labs@440fe8d))


### Bug Fixes

* **api:** batch fee input scaling for `debug_traceCall`
([matter-labs#3344](matter-labs#3344))
([7ace594](matter-labs@7ace594))
* **tee:** correct previous fix for race condition in batch locking
([matter-labs#3358](matter-labs#3358))
([b12da8d](matter-labs@b12da8d))
* **tee:** fix race condition in batch locking
([matter-labs#3342](matter-labs#3342))
([a7dc0ed](matter-labs@a7dc0ed))
* **tracer:** adds vm error to flatCallTracer error field if exists
([matter-labs#3374](matter-labs#3374))
([5d77727](matter-labs@5d77727))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>

* feat(eigen-client-extra-features): Fix PR comments (#369)

* Add envy load

* Readd proto reference

* Rename blob id to request id

* Make literals constants

* Make point size constant

* Get pool unique

* Remaining comments

* Fix comment

* Add check for failed states

* Change l1 name

* Cargo lock conflicts

* remove concurrent dispatcher leftovers

* Solve comments (#372)

* remove METRICS var

* feat(eigen-client-extra-features): address PR comments (#375)

* Change settlement layer for u32

* Change string to address

* Remove unwraps

* Remove error from name

* Remove unused to bytes

* Rename call for get blob data

* Revert "Change string to address"

This reverts commit 6dd94d4.

* Change string for address

* feat(eigen-client-extra-features): address PR comments (part 2) (#374)

* initial commit

* clippy suggestion

* feat(eigen-client-extra-features): address PR comments (part 3) (#376)

* use keccak256 fn

* simplify get_context_block

* use saturating sub

* feat(eigen-client-extra-features): address PR comments (part 4) (#378)

* Replace decode bytes for ethabi

* Add default to eigenconfig

* Change str to url

* Add index to data availability table

* Address comments

* Change error to verificationerror

* Format code

* feat(eigen-client-extra-features): address PR comments (part 5) (#377)

* use trait object

* prevent blocking non async code

* clippy suggestion

---------

Co-authored-by: juan518munoz <[email protected]>

---------

Co-authored-by: Gianbelinche <[email protected]>

---------

Co-authored-by: Gianbelinche <[email protected]>

* Format code

---------

Co-authored-by: juan518munoz <[email protected]>

---------

Co-authored-by: perekopskiy <[email protected]>
Co-authored-by: Bruno França <[email protected]>
Co-authored-by: kelemeno <[email protected]>
Co-authored-by: Dustin Brickwood <[email protected]>
Co-authored-by: zksync-era-bot <[email protected]>
Co-authored-by: zksync-era-bot <[email protected]>
Co-authored-by: Gianbelinche <[email protected]>
gianbelinche added a commit to lambdaclass/zksync-era that referenced this pull request Jan 3, 2025
* feat(state-keeper): mempool io opens batch if there is protocol upgrade tx (matter-labs#3360)

## What ❔

Mempool io opens batch if there is protocol upgrade tx

## Why ❔

Currently if mempool is empty but there is protocol upgrade tx, then
batch is not opened

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [ ] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [ ] Documentation comments have been added / updated.
- [ ] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.

* fix: Fixed cargo deny (matter-labs#3372)

## What ❔

Fixes cargo deny CI fail.

* docs: interop docs update (matter-labs#3366)

## What ❔

<!-- What are the changes this PR brings about? -->
<!-- Example: This PR adds a PR template to the repo. -->
<!-- (For bigger PRs adding more context is appreciated) -->

## Why ❔

<!-- Why are these changes done? What goal do they contribute to? What
are the principles behind them? -->
<!-- Example: PR templates ensure PR reviewers, observers, and future
iterators are in context about the evolution of repos. -->

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [ ] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [ ] Documentation comments have been added / updated.
- [ ] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.

* fix(tracer): adds vm error to flatCallTracer error field if exists (matter-labs#3374)

## What ❔

<!-- What are the changes this PR brings about? -->
<!-- Example: This PR adds a PR template to the repo. -->
<!-- (For bigger PRs adding more context is appreciated) -->
- Updates `flatCallTracer` error to include vm error if it exists 

## Why ❔

<!-- Why are these changes done? What goal do they contribute to? What
are the principles behind them? -->
<!-- Example: PR templates ensure PR reviewers, observers, and future
iterators are in context about the evolution of repos. -->
- MM has requested that if an error exists we should populate within
`flatCallTracer` as this is what others do, prior to this PR it was only
revert_reason introduced here:
matter-labs#3306. However, if we have
a vm error the error field is not populated as seen in this tx:
`0x6c85bf34666dcdaa885f2bc6e95186029d2b25f2a3bbdff21c36878e2d4a19ed`
which failed due to a vm panic.

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [x] Documentation comments have been added / updated.
- [x] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.

* chore(main): release core 25.3.0 (matter-labs#3313)

:robot: I have created a release *beep* *boop*
---


##
[25.3.0](matter-labs/zksync-era@core-v25.2.0...core-v25.3.0)
(2024-12-11)


### Features

* change seal criteria for gateway
([matter-labs#3320](matter-labs#3320))
([a0a74aa](matter-labs@a0a74aa))
* **contract-verifier:** Download compilers from GH automatically
([matter-labs#3291](matter-labs#3291))
([a10c4ba](matter-labs@a10c4ba))
* integrate gateway changes for some components
([matter-labs#3274](matter-labs#3274))
([cbc91e3](matter-labs@cbc91e3))
* **proof-data-handler:** exclude batches without object file in GCS
([matter-labs#2980](matter-labs#2980))
([3e309e0](matter-labs@3e309e0))
* **pruning:** Record L1 batch root hash in pruning logs
([matter-labs#3266](matter-labs#3266))
([7b6e590](matter-labs@7b6e590))
* **state-keeper:** mempool io opens batch if there is protocol upgrade
tx ([matter-labs#3360](matter-labs#3360))
([f6422cd](matter-labs@f6422cd))
* **tee:** add error handling for unstable_getTeeProofs API endpoint
([matter-labs#3321](matter-labs#3321))
([26f630c](matter-labs@26f630c))
* **zksync_cli:** Health checkpoint improvements
([matter-labs#3193](matter-labs#3193))
([440fe8d](matter-labs@440fe8d))


### Bug Fixes

* **api:** batch fee input scaling for `debug_traceCall`
([matter-labs#3344](matter-labs#3344))
([7ace594](matter-labs@7ace594))
* **tee:** correct previous fix for race condition in batch locking
([matter-labs#3358](matter-labs#3358))
([b12da8d](matter-labs@b12da8d))
* **tee:** fix race condition in batch locking
([matter-labs#3342](matter-labs#3342))
([a7dc0ed](matter-labs@a7dc0ed))
* **tracer:** adds vm error to flatCallTracer error field if exists
([matter-labs#3374](matter-labs#3374))
([5d77727](matter-labs@5d77727))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>

* feat(eigen-client-extra-features): Fix PR comments (#369)

* Add envy load

* Readd proto reference

* Rename blob id to request id

* Make literals constants

* Make point size constant

* Get pool unique

* Remaining comments

* Fix comment

* Add check for failed states

* Change l1 name

* Cargo lock conflicts

* remove concurrent dispatcher leftovers

* Solve comments (#372)

* Remove eigen client for external crate

* Add real repo

* remove METRICS var

* Change proxy name and remove generic

* feat(eigen-client-extra-features): address PR comments (#375)

* Change settlement layer for u32

* Change string to address

* Remove unwraps

* Remove error from name

* Remove unused to bytes

* Rename call for get blob data

* Revert "Change string to address"

This reverts commit 6dd94d4.

* Change string for address

* feat(eigen-client-extra-features): address PR comments (part 2) (#374)

* initial commit

* clippy suggestion

* feat(eigen-client-extra-features): address PR comments (part 3) (#376)

* use keccak256 fn

* simplify get_context_block

* use saturating sub

* feat(eigen-client-extra-features): address PR comments (part 4) (#378)

* Replace decode bytes for ethabi

* Add default to eigenconfig

* Change str to url

* Add index to data availability table

* Address comments

* Change error to verificationerror

* Format code

* feat(eigen-client-extra-features): address PR comments (part 5) (#377)

* use trait object

* prevent blocking non async code

* clippy suggestion

---------

Co-authored-by: juan518munoz <[email protected]>

---------

Co-authored-by: Gianbelinche <[email protected]>

---------

Co-authored-by: Gianbelinche <[email protected]>

* Format code

---------

Co-authored-by: juan518munoz <[email protected]>

* Fix compilation

* Update branch

---------

Co-authored-by: perekopskiy <[email protected]>
Co-authored-by: Bruno França <[email protected]>
Co-authored-by: kelemeno <[email protected]>
Co-authored-by: Dustin Brickwood <[email protected]>
Co-authored-by: zksync-era-bot <[email protected]>
Co-authored-by: zksync-era-bot <[email protected]>
Co-authored-by: Juan Munoz <[email protected]>
Co-authored-by: juan518munoz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants