-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: investigate if restore on retry gradually slows down because of CheckForKeyCollisions #81116
Comments
cc @cockroachdb/bulk-io |
Seemed to complete then they took away secondary indexes. Maybe a red herring -- could point to secondary index overlaps and/or nefarious behavior near disk full and/or something else completely. @smcvey were we able to pull a CPU profile during the modified IMPORT? |
I started looking into this as part of support#1539 and I think there is some smoke here. Just dumping some initial findings. Basic Repro Steps
Vanilla Run: Run with 2 pauses at 25% and 50%: I instrumented the binary to better understand restore's checkpointing logic in particular https://github.com/cockroachdb/cockroach/blob/v21.2.7/pkg/ccl/backupccl/restore_job.go#L518. In short, whenever a In a vanilla run:
We start with 389 spans.
340 progress updates of the 389 spans do not result in the high watermark being forwarded because of the order in which they were received. While this might not be very useful as is, it seems to indicate that we can easily get into a situation where we've ingested far more than we can progress the high watermark to. Let's see how this behaves in a paused and resumed restore. In a run paused a couple of times:
Starts with 389 spans.
At the first pause, we've received 144 updates but only 6 contiguous ones and so our high watermark is only 6. So we're effectively throwing away 138 restore spans that have been ingested?
At the second pause, we received 167 progress updates but only progressed high watermark to 104.
I have to step away for a bit but the next step for me is to instrument |
I disabled Every restore span we ingest uses its own SSTBatcher, here are some entries before the pause:
Here are some entries after. I'm still curious where the other part of the
|
Proposal: We switch restore progress to use a span frontier instead of the slice of progress indexes that it uses today. The first time we run a restore job we will compute the Every time we receive a progress update from the restore data processor for a particular span, we will forward its timestamp in the frontier to the restore's AOST. Essentially marking it as restored until that timestamp. On subsequent job resumptions we then fetch the frontier entries that have a ts less than the restore aost and consider those as the remaining work to do. This way we prevent redoing work just because a lagging restore span has held up the high watermark in our current implementation. Thoughts? |
This change flips the `disallowShadowing` boolean in the SST batcher used by RESTORE to false, in release builds. disallowShadowing is set to `false` because RESTORE, who is the sole user of this SSTBatcher, is expected to be ingesting into an empty keyspace. If a restore job is resumed, the un-checkpointed spans that are re-ingested will perfectly shadow (equal key, value and ts) the already ingested keys. disallowShadowing used to be set to `true` because it was believed that the even across resumptions of a restore job, `checkForKeyCollisions` would be inexpensive because of our frequent job checkpointing. Further investigation in cockroachdb#81116 revealed that our progress checkpointing could significantly lag behind the spans we have ingested, making a resumed restore spend a lot of time in `checkForKeyCollisions` leading to severely degraded performance. We have *never* seen a restore fail because of the invariant enforced when `disallowShadowing` is set to true, and so we feel comfortable flipping this check to false. A future version will work on fixing our progress checkpointing so that we do not have a buildup of un-checkpointed work, at which point we can reassess flipping `disallowShadowing` to true. Release note: None
This roachtest pauses the restore job every 10 minutes and resumes it after it has reached a paused state. This was an area of testing we were lacking and allowed for performance degredations like cockroachdb#81116 to go unnoticed. Informs: cockroachdb#81116 Release note: None
disallowShadowingBelow is set to an empty hlc.Timestamp in release builds i.e. allow all shadowing without AddSSTable having to check for overlapping keys. This is because RESTORE is expected to ingest into an empty keyspace. If a restore job is resumed, the un-checkpointed spans that are re-ingested will perfectly shadow (equal key, value and ts) the already ingested keys. disallowShadowingBelow used to be unconditionally set to logical=1. This permissive value would allow shadowing in case the RESTORE has to retry ingestions but served to force evaluation of AddSSTable to check for overlapping keys. It was believed that even across resumptions of a restore job, `checkForKeyCollisions` would be inexpensive because of our frequent job checkpointing. Further investigation in cockroachdb#81116 revealed that our progress checkpointing could significantly lag behind the spans we have ingested, making a resumed restore spend a lot of time in `checkForKeyCollisions` leading to severely degraded performance. We have *never* seen a restore fail because of the invariant enforced by setting `disallowShadowingBelow` to a non-empty value, and so we feel comfortable disabling this check entirely. A future release will work on fixing our progress checkpointing so that we do not have a buildup of un-checkpointed work, at which point we can reassess reverting to logical=1. Informs: cockroachdb#81116 Release note: None
This change flips the `disallowShadowing` boolean in the SST batcher used by RESTORE to false, in release builds. disallowShadowing is set to `false` because RESTORE, who is the sole user of this SSTBatcher, is expected to be ingesting into an empty keyspace. If a restore job is resumed, the un-checkpointed spans that are re-ingested will perfectly shadow (equal key, value and ts) the already ingested keys. disallowShadowing used to be set to `true` because it was believed that the even across resumptions of a restore job, `checkForKeyCollisions` would be inexpensive because of our frequent job checkpointing. Further investigation in cockroachdb#81116 revealed that our progress checkpointing could significantly lag behind the spans we have ingested, making a resumed restore spend a lot of time in `checkForKeyCollisions` leading to severely degraded performance. We have *never* seen a restore fail because of the invariant enforced when `disallowShadowing` is set to true, and so we feel comfortable flipping this check to false. A future version will work on fixing our progress checkpointing so that we do not have a buildup of un-checkpointed work, at which point we can reassess flipping `disallowShadowing` to true. Informs: cockroachdb#81116 Release note: None
Fixes: cockroachdb#81116, cockroachdb#87843 Release note (performance improvement): Previously, whenever a user resumed a paused `RESTORE` job the checkpointing mechanism would potentially not account for completed work. This change allows completed spans to be skipped over when restoring.
Fixes: cockroachdb#81116, cockroachdb#87843 Release note (performance improvement): Previously, whenever a user resumed a paused `RESTORE` job the checkpointing mechanism would potentially not account for completed work. This change allows completed spans to be skipped over when restoring.
Fixes: cockroachdb#81116, cockroachdb#87843 Release note (performance improvement): Previously, whenever a user resumed a paused `RESTORE` job the checkpointing mechanism would potentially not account for completed work. This change allows completed spans to be skipped over when restoring.
Fixes: cockroachdb#81116, cockroachdb#87843 Release note (performance improvement): Previously, whenever a user resumed a paused `RESTORE` job the checkpointing mechanism would potentially not account for completed work. This change allows completed spans to be skipped over when restoring.
Fixes: cockroachdb#81116, cockroachdb#87843 Release note (performance improvement): Previously, whenever a user resumed a paused `RESTORE` job the checkpointing mechanism would potentially not account for completed work. This change allows completed spans to be skipped over when restoring.
this should no longer be an issue with our new checkpointing procedure #97862 |
In a support issue, we noticed CPU profiles from several nodes in a cluster (21.2.10) running a restore of a few TB, being dominated by the
CheckForKeyCollisions
.Restore ingests into an empty keyspace and the expectation is that this check is essentially a no-op. If however, the restore were to hit a transient error or a condition that would make it retry from the last checkpoint, then we might end up in a situation where we are ingesting data into a non-empty keyspace. In that case, this check has the potential to become increasingly expensive. In the support issue, we noticed the restore slow to a crawl.
We have investigated similar slowdowns in out-of-order imports #66410, and are working on optimizations for the same #80980. Since restore and import share the ingestion pipeline this issue will benefit from the work being done for import, but we should attempt to reproduce what we saw in the support issue by forcing a restore to retry at different stages and seeing if it has a long tail because of the SST collision check.
Jira issue: CRDB-15138
Epic CRDB-20916
The text was updated successfully, but these errors were encountered: