Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(db): Fix write stalls in RocksDB (for real this time) #292

Merged
merged 4 commits into from
Oct 24, 2023

Conversation

slowli
Copy link
Contributor

@slowli slowli commented Oct 23, 2023

What ❔

A previous fix didn't really work judging by Merkle tree behavior on the stage env. This PR makes the initialization timeout configurable (and increases the default value from 10s to 30s; 30s is approximately equal to the compaction duration) and slightly increases the number of retries on stalled writes.

Why ❔

Having write stalls leads to panics and is obviously bad.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.

@slowli
Copy link
Contributor Author

slowli commented Oct 23, 2023

Couple of other options we might consider:

  • Increase the number of level-0 SST files for stopped writes. I feel that it doesn't solve the problem, just postpones it.
  • Add a subcommand to rocksdb_util (compact?) that would just create a RocksDB instance and wait indefinitely until its writes are not stopped. This could work well with the main node, but AFAICT, rocksdb_util is currently not shipped with the EN Docker image and making it work there could require some work.

@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (0e5eefc) 35.58% compared to head (3345c17) 35.61%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #292      +/-   ##
==========================================
+ Coverage   35.58%   35.61%   +0.03%     
==========================================
  Files         520      520              
  Lines       28336    28357      +21     
==========================================
+ Hits        10084    10100      +16     
- Misses      18252    18257       +5     
Files Coverage Δ
core/bin/external_node/src/main.rs 0.00% <ø> (ø)
core/lib/config/src/configs/database.rs 69.23% <100.00%> (+2.56%) ⬆️
...lib/zksync_core/src/metadata_calculator/helpers.rs 87.50% <100.00%> (+0.40%) ⬆️
...ore/lib/zksync_core/src/metadata_calculator/mod.rs 88.23% <100.00%> (+0.23%) ⬆️
...lib/zksync_core/src/metadata_calculator/updater.rs 94.25% <100.00%> (+0.03%) ⬆️
core/bin/external_node/src/config/mod.rs 21.42% <0.00%> (-0.80%) ⬇️
core/lib/storage/src/db.rs 51.90% <78.94%> (+4.61%) ⬆️

... and 29 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RomanBrodetski
RomanBrodetski previously approved these changes Oct 23, 2023
Copy link
Collaborator

@RomanBrodetski RomanBrodetski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, let's keep trying!

@slowli slowli marked this pull request as ready for review October 23, 2023 17:56
@slowli slowli requested a review from a team as a code owner October 23, 2023 17:56
@slowli slowli added this pull request to the merge queue Oct 24, 2023
Merged via the queue into main with commit 0f15919 Oct 24, 2023
25 checks passed
@slowli slowli deleted the aov-pla-629-investigate-write-stalls-in-rocksdb-pt3 branch October 24, 2023 07:07
github-merge-queue bot pushed a commit that referenced this pull request Oct 24, 2023
🤖 I have created a release *beep* *boop*
---


##
[16.1.0](core-v16.0.2...core-v16.1.0)
(2023-10-24)


### Features

* Add new commitments
([#219](#219))
([a19256e](a19256e))
* arm64 zk-environment rust Docker images and other
([#296](#296))
([33174aa](33174aa))
* **config:** Extract everything not related to the env config from
zksync_config crate
([#245](#245))
([42c64e9](42c64e9))
* **eth-watch:** process governor upgrades
([#247](#247))
([d250294](d250294))
* **merkle tree:** Expose Merkle tree API
([#209](#209))
([4010c7e](4010c7e))
* **merkle tree:** Snapshot recovery for Merkle tree
([#163](#163))
([9e20703](9e20703))
* **multivm:** Remove lifetime from multivm
([#218](#218))
([7eda27c](7eda27c))
* Remove fee_ticker and token_trading_volume fetcher modules
([#262](#262))
([44f7179](44f7179))
* **reorg_detector:** compare miniblock hashes for reorg detection
([#236](#236))
([2c930b2](2c930b2))
* Rewrite server binary to use `vise` metrics
([#120](#120))
([26ee1fb](26ee1fb))
* **types:** introduce state diff record type and compression
([#194](#194))
([ccf753c](ccf753c))
* **vm:** Improve tracer trait
([#121](#121))
([ff60138](ff60138))
* **vm:** Move all vm versions to the one crate
([#249](#249))
([e3fb489](e3fb489))


### Bug Fixes

* **crypto:** update snark-vk to be used in server and update args for
proof wrapping
([#240](#240))
([4a5c54c](4a5c54c))
* **db:** Fix write stalls in RocksDB
([#250](#250))
([650124c](650124c))
* **db:** Fix write stalls in RocksDB (again)
([#265](#265))
([7b23ab0](7b23ab0))
* **db:** Fix write stalls in RocksDB (for real this time)
([#292](#292))
([0f15919](0f15919))
* Fix `TxStage` string representation
([#255](#255))
([246b5a0](246b5a0))
* fix typos
([#226](#226))
([feb8a6c](feb8a6c))
* **witness-generator:** Witness generator oracle with cached storage
refunds ([#274](#274))
([8928a41](8928a41))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants