Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BucketlistDB stored-index integrity checks and rebuild scenarios #3677

Closed
graydon opened this issue Mar 14, 2023 · 2 comments
Closed

BucketlistDB stored-index integrity checks and rebuild scenarios #3677

graydon opened this issue Mar 14, 2023 · 2 comments

Comments

@graydon
Copy link
Contributor

graydon commented Mar 14, 2023

@MonsieurNicolas and I were discussing today how to limit the blast radius of an error in the prototype on-disk storage of BucketListDB indexes. We came up with a few separate observations, which I think might be worth treating as separate work items (though we might decide not to do some also):

  1. We should definitely have an "index logic version number" in each file and force-rebuild if there's ever a mismatch between the loaded index and the logic in memory. I think this is already done (with BucketIndexVersion), correct?
  2. We might reload-from-disk and then also kick off an index-rebuild in the background, checking that the rebuilt index is identical to the reloaded one. This puts an upper bound on the amount of time that might pass before we observe corruption (which might otherwise take a while to occur).
  3. We might disable index-reloading on validators. There's some precedent in our code for turning off "stuff that has more failure modes" when NODE_IS_VALIDATOR=true: we currently reject configurations that try to use streaming txmeta, for example, because it can cause a synchronous stall during ledger close, or in-memory mode, because it might lose durable records. We could similarly say that you only get fast-restarts on observer / captive-core nodes.

We also discussed ways we might stress-test the index code to ensure there are no bugs in the bloom filter. It's fairly critical that the bloom filter work 100% reliably, and small corruption caused by a bug in it will be very hard to fix in the field. Besides just replaying history with it turned on, we thought it might be worth writing a small randommized in-memory tester that has a "true" reference hashmap that it can check the true/false-positive behaviour of the bloom filter against.

Any other ideas for tilting the odds? Please add to this bug / discussion below!

@SirTyson
Copy link
Contributor

1 is already implemented. The first field on-disk is the version number, then page size, then data. First we check the version, then check the page size to see if any config parameters have changed (all config options are captured by the page size).

I think 2 wouldn't be super helpful and would have some significant side effects performance wise. Even if we catch a bad index, we've probably already used it, especially on systems with slow disks (currently up to 5 minutes for the largest indexes on EBS). Core is already IO bound on the startup path if catchup or bucket apply has to run and background indexing would further this issue. This has a compounding effect, and background indexing would probably take more than 5 minutes if bucket apply was running at the same time, meaning the potentially bad index would live even longer.

I think 3 is a good idea, especially in the short term so we can get this fix out to captive-core. We already have a EXPERIMENTAL_BUCKETLIST_DB_PERSIST_INDEX config flag that defaults to true. All we need to do is set this flag to false if NODE_IS_VALIDATOR=true. Additionally, during testing the sync loss bug didn't come up because I was testing on a validator SKU with NVME drives (although I need to double check this). If validator's are running NVMEs, they wouldn't benefit from disk serialization nearly as much as the horizon nodes, and may not experience the sync loss bug at all. At least for this initial PR, I like only serializing on watcher nodes.

@SirTyson
Copy link
Contributor

SirTyson commented Mar 6, 2024

At this point between watcher nodes, additional unit tests, and captive-core running BucketListDB with persisted indexes for almost a year without issue, I think BuckekListDB is mature enough to close this.

@SirTyson SirTyson closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants