-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk: range cache miss during index backfill causes sustained slowdown #84290
Labels
A-disaster-recovery
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-disaster-recovery
Comments
nvanbenschoten
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-disaster-recovery
labels
Jul 12, 2022
craig bot
pushed a commit
that referenced
this issue
Sep 20, 2022
85859: pgwire: Add support for cursors with special characters r=knz,rafiss a=rimadeodhar IIn CockroachDB and Postgres, it is possible to declare cursors with special characters enclosed within double quotes, for e.g. "1-2-3". Currently, we store the name as an unescaped string which causes problems during the pgwire DESCRIBE step for looking up the cursor. We should be storing using the tree.Name datatype for the cursor name while storing and looking up cursors. This PR updates the code to start using tree.Name instead of raw strings for handling cursor names. This fixes the issue where the pgwire DESCRIBE step fails while attempting to look up cursors with names containing special characters. Resolves #84261 Release note (bug fix): The pgwire DESCRIBE step no longer fails with an error while attempting to look up cursors declared with names containing special characters. 86492: bulk: perform meta lookup on range cache miss during index backfill r=dt a=nvanbenschoten Fixes #84290. This commit addresses the sustained slowdown described in #84290 by replacing the call in `SSTBatcher.flushIfNeeded` to `RangeCache.GetCached` with a call to `RangeCache.Lookup`. The former method checks the cache but returns no range descriptor if the cache currently has no overlapping key. This is possible if the descriptor was recently evicted because it was stale. The later method performs a meta lookup if the cache currently has no overlapping key, so it should always return _some_ range descriptor. There's a question of whether we should be logging a warning but proceeding if this meta lookup fails. For now, I've decided not to change that behavior. Release justification: None. Don't merge yet. 87885: kv: remove broken attempt to reject lease acquisitions on draining nodes r=nvanbenschoten a=nvanbenschoten Related to #83261. This commit removes "protection" that avoided lease acquisitions on draining nodes. This protection had already been effectively disabled by acc1ad1, which allowed Raft leaders to bypass the check. As the comment here (added in 5ffaa9e) explained, Raft followers are already unable to acquire the lease. If leaders bypass the check and follower (and candidates) don't need it, the check is useless, so we remove it. The commit also removes `TestReplicaDrainLease`, which was supposed to test this behavior. We remove the test not because it started failing after the change, but because it did not. It must not have been testing anything real anymore after acc1ad1. Release justification: low risk change related to release blocker. Release note: None 88205: kvserver: (partially) deflake transfer-leases/drain-other-node r=irfansharif a=irfansharif In #85629 we changed our lease transfer protocol to only ever transfer expiration-based leases, and have recipients later upgrade them to the more efficient epoch based ones. This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds -- so we wanted this upgrade happen after having received the lease. In #83261 however we noticed that the upgrade was not immediate -- we were waiting until the current lease's expiration was within its renewal duration -- 4.5s. When the lease was eventually renewed the upgrade did happen, but it was not immediate. We fix this here and remove the manual clock advancing the supporting test had that masked this issue. It now demonstrates that we're no longer relying on upgrades happen as part of the (slow) renewal process. Release note: None Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: irfan sharif <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 21, 2022
Fixes #84290. This commit addresses the sustained slowdown described in #84290 by replacing the call in `SSTBatcher.flushIfNeeded` to `RangeCache.GetCached` with a call to `RangeCache.Lookup`. The former method checks the cache but returns no range descriptor if the cache currently has no overlapping key. This is possible if the descriptor was recently evicted because it was stale. The later method performs a meta lookup if the cache currently has no overlapping key, so it should always return _some_ range descriptor. There's a question of whether we should be logging a warning but proceeding if this meta lookup fails. For now, I've decided not to change that behavior. Release justification: None. Don't merge yet.
nvanbenschoten
added a commit
that referenced
this issue
Nov 16, 2022
Fixes #84290. This commit addresses the sustained slowdown described in #84290 by replacing the call in `SSTBatcher.flushIfNeeded` to `RangeCache.GetCached` with a call to `RangeCache.Lookup`. The former method checks the cache but returns no range descriptor if the cache currently has no overlapping key. This is possible if the descriptor was recently evicted because it was stale. The later method performs a meta lookup if the cache currently has no overlapping key, so it should always return _some_ range descriptor. There's a question of whether we should be logging a warning but proceeding if this meta lookup fails. For now, I've decided not to change that behavior. Release justification: None. Don't merge yet.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-disaster-recovery
During an index creation on a 6TB table in a cluster with 15 nodes (48 vCPU per instance), I noticed a surprising slowdown. After running for about 12 hours, the index creation slowed to a trickle (~1% of peak ingestion throughput). About an hour later, it sped up to about 10% of peak ingestion throughput.
The
distsender.rpc.addsstable.sent
metric revealed an interesting pattern. Most nodes had stopped sending AddSSTable requests entirely — they were finished. However, much earlier in the backfill, two of the nodes had fallen off dramatically in their rate of sending AddSSTable requests.Interestingly, at different times, these two nodes eventually did pick up the rate of sending and quickly finished.
I was able to grab logs during the slow period from each of these nodes: cockroach.log
I was also able to grab logspy with high verbosity on the
sst_batcher.go
file: full_logspy.txtWe see an interesting pattern here. During the slowdown, the index backfiller appears to be in a linear regime where it repeatedly hits
SSTable cannot be added spanning range bounds
lines. These are meant to be rare.In one (of dozens, I got lucky) of my logspy attempts, I was able to catch a pair of log lines that I think hints at the problem:
It seems that at some point, the SSTBatcher checked to see whether it should perform a
rangeFlush
:cockroach/pkg/kv/bulk/sst_batcher.go
Lines 322 to 328 in 4566e92
When checking, it did not find a cached range descriptor for the first key in a batch. As a result, it decided not to perform a range flush for the current batch. Instead, it waited until the batch was 16 MiB in size before performing a size flush.
This did not go well. At the time of this size flush, the index keyspace had 11971 ranges. As a result, the 16 MiB buffer was split into chunks of about 5 KiB each (according to logspy). We then sent these small AddSSTables one-by-one. Each took about 50ms, so we sent about 20 AddSSTable's per second. This meant that we spent about 160 seconds to flush the 16 MiB buffer.
I think the takeaway here is that a RangeCache miss can undermine #79967, which becomes increasingly important with large data sizes. Perhaps we should replace the call to
RangeCache.GetCached
with one toRangeCache.Lookup
so that we'll perform a RangeCache lookup on a cache misses. We're going to send to the range anyway, so a cache lookup now vs. later shouldn't make a difference.One thing I don't understand is why we kept hitting range cache misses once in this regime. Perhaps that hints at something else going wrong as well. Or perhaps there was a gap in our range cache that kept causing us to skip range flushes but was also never filled during a size flush.
cc. @dt
Jira issue: CRDB-17573
The text was updated successfully, but these errors were encountered: