You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@MonsieurNicolas and I were discussing today how to limit the blast radius of an error in the prototype on-disk storage of BucketListDB indexes. We came up with a few separate observations, which I think might be worth treating as separate work items (though we might decide not to do some also):
We should definitely have an "index logic version number" in each file and force-rebuild if there's ever a mismatch between the loaded index and the logic in memory. I think this is already done (with BucketIndexVersion), correct?
We might reload-from-disk and then also kick off an index-rebuild in the background, checking that the rebuilt index is identical to the reloaded one. This puts an upper bound on the amount of time that might pass before we observe corruption (which might otherwise take a while to occur).
We might disable index-reloading on validators. There's some precedent in our code for turning off "stuff that has more failure modes" when NODE_IS_VALIDATOR=true: we currently reject configurations that try to use streaming txmeta, for example, because it can cause a synchronous stall during ledger close, or in-memory mode, because it might lose durable records. We could similarly say that you only get fast-restarts on observer / captive-core nodes.
We also discussed ways we might stress-test the index code to ensure there are no bugs in the bloom filter. It's fairly critical that the bloom filter work 100% reliably, and small corruption caused by a bug in it will be very hard to fix in the field. Besides just replaying history with it turned on, we thought it might be worth writing a small randommized in-memory tester that has a "true" reference hashmap that it can check the true/false-positive behaviour of the bloom filter against.
Any other ideas for tilting the odds? Please add to this bug / discussion below!
The text was updated successfully, but these errors were encountered:
1 is already implemented. The first field on-disk is the version number, then page size, then data. First we check the version, then check the page size to see if any config parameters have changed (all config options are captured by the page size).
I think 2 wouldn't be super helpful and would have some significant side effects performance wise. Even if we catch a bad index, we've probably already used it, especially on systems with slow disks (currently up to 5 minutes for the largest indexes on EBS). Core is already IO bound on the startup path if catchup or bucket apply has to run and background indexing would further this issue. This has a compounding effect, and background indexing would probably take more than 5 minutes if bucket apply was running at the same time, meaning the potentially bad index would live even longer.
I think 3 is a good idea, especially in the short term so we can get this fix out to captive-core. We already have a EXPERIMENTAL_BUCKETLIST_DB_PERSIST_INDEX config flag that defaults to true. All we need to do is set this flag to false if NODE_IS_VALIDATOR=true. Additionally, during testing the sync loss bug didn't come up because I was testing on a validator SKU with NVME drives (although I need to double check this). If validator's are running NVMEs, they wouldn't benefit from disk serialization nearly as much as the horizon nodes, and may not experience the sync loss bug at all. At least for this initial PR, I like only serializing on watcher nodes.
At this point between watcher nodes, additional unit tests, and captive-core running BucketListDB with persisted indexes for almost a year without issue, I think BuckekListDB is mature enough to close this.
@MonsieurNicolas and I were discussing today how to limit the blast radius of an error in the prototype on-disk storage of BucketListDB indexes. We came up with a few separate observations, which I think might be worth treating as separate work items (though we might decide not to do some also):
BucketIndexVersion
), correct?We also discussed ways we might stress-test the index code to ensure there are no bugs in the bloom filter. It's fairly critical that the bloom filter work 100% reliably, and small corruption caused by a bug in it will be very hard to fix in the field. Besides just replaying history with it turned on, we thought it might be worth writing a small randommized in-memory tester that has a "true" reference hashmap that it can check the true/false-positive behaviour of the bloom filter against.
Any other ideas for tilting the odds? Please add to this bug / discussion below!
The text was updated successfully, but these errors were encountered: