-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: parallelize loading of manifests from External Storage #87311
Conversation
515b7d3
to
2f53b1f
Compare
Comparing two traces of a Old trace: This is a summary of where we are spending time in this operation. Each row is an aggregation of all operations with that operation name across the execution of the show backup:
Taking the hierarchy of method calls into account we can infer that the majority of the time is spent in
and
The majority of the time in both methods is spent in the call to New trace: In the new trace where we have changed the
The total time goes down for this |
For bulk folks the context for this change is https://cockroachlabs.slack.com/archives/C03RU3US2H1/p1661190445763609 |
Note, |
Yay to this PR! @adityamaru, can we check what this PR does to the ~2s Aditya, I'm sure the testing you've done here is a better stress test of |
Definitely makes sense to run on what we actually want to improve. @dasrirez if you want to take this patch for a spin and grab a trace using |
This is wonderful! I ran the same simple test with 1 full + 24 incremental backups taken from a fresh cluster and the difference is absurd.
Note that I issued the queries from a local cluster (avg of 8 trials) and the bucket lookup latency is much higher than when these queries are issued on the cloud cluster. On the cloud cluster, a backup with a single incremental takes 0.3s to query on average, which is why the backup with 24 incrementals reported 2s before without the patch compared to the 4.5s I got from my local cluster. @adityamaru I will add the stmt bundle for the 24 incremental query w/ the patch in the SendSafely workspace since it contains some creds! |
Yay! And such fun DR + SRE collaboration too. |
Thanks for the trace, I think the optimization is working as expected: In the trace if we start at the top-level
Drilling down:
Largely accounts for the 1.4s. So lets look at each: First
Now in
The most interesting one pertaining to this optimization is
Since the reading of manifests happens concurrently the read path should take roughly |
2f53b1f
to
e482876
Compare
This change is a targetted change to parallelize the loading of backup manifests for each incremental layer of a backup. This method is shared by both restore as well as `SHOW BACKUP`. Release note: None Release justification: low risk performance improvement required for making `SHOW BACKUP` in the presence of many incremental layers more performant
e482876
to
bd36bc5
Compare
The Bazel failures in bors r=benbardin |
Build failed (retrying...): |
Build succeeded: |
blathers backport release-22.2 |
In cockroachdb#87311 we refactored `ResolveBackupManifests` to concurrently load manifests from base and incremental layers by calling `FetchPreviousBackups`. The mistake in that diff was to pass in only the incremental layers of the backup to `FetchPreviousBackup` instead of passing in the base backup + incremental layers. This was silently passing all tests because in `ResolveBackupManifests` we explicitly setup the base backup layer in all the relevant slices before processing each incremental layer. The one case this was not okay in was encrypted backups. `FetchPreviousBackups` apart from doing the obvious would also read the encryption options from the base backups before reading the manifests from each layer. Now, because we stopped sending the base backup in the input slice, this step would fail since the method would go to the first incremental backup (instead of the base backup) and attempt to read the ENCRYPTION_INFO file. This file is only ever written to the base backup and so a `SHOW BACKUP` or a `RESTORE` of an encrypted backup would fail with a file not found error. In this diff we: 1) Fix `ResolveBackupManifests` by passing in the base backup + incremental backups to `FetchPreviousBackups`. 2) Make it the callers responsibility to pass in the fully hydrated encryption options before calling `FetchPreviousBackups` so that the method is *only* fetching backup manifests. Fixes: cockroachdb#91886 Release note (bug fix): fixes a bug that would cause `SHOW BACKUP` and `RESTORE` of encrypted incremental backups to fail
In cockroachdb#87311 we refactored `ResolveBackupManifests` to concurrently load manifests from base and incremental layers by calling `FetchPreviousBackups`. The mistake in that diff was to pass in only the incremental layers of the backup to `FetchPreviousBackup` instead of passing in the base backup + incremental layers. This was silently passing all tests because in `ResolveBackupManifests` we explicitly setup the base backup layer in all the relevant slices before processing each incremental layer. The one case this was not okay in was encrypted backups. `FetchPreviousBackups` apart from doing the obvious would also read the encryption options from the base backups before reading the manifests from each layer. Now, because we stopped sending the base backup in the input slice, this step would fail since the method would go to the first incremental backup (instead of the base backup) and attempt to read the ENCRYPTION_INFO file. This file is only ever written to the base backup and so a `SHOW BACKUP` or a `RESTORE` of an encrypted backup would fail with a file not found error. In this diff we: 1) Fix `ResolveBackupManifests` by passing in the base backup + incremental backups to `FetchPreviousBackups`. 2) Make it the callers responsibility to pass in the fully hydrated encryption options before calling `FetchPreviousBackups` so that the method is *only* fetching backup manifests. Fixes: cockroachdb#91886 Release note (bug fix): fixes a bug that would cause `SHOW BACKUP` and `RESTORE` of encrypted incremental backups to fail
In #87311 we refactored `ResolveBackupManifests` to concurrently load manifests from base and incremental layers by calling `FetchPreviousBackups`. The mistake in that diff was to pass in only the incremental layers of the backup to `FetchPreviousBackup` instead of passing in the base backup + incremental layers. This was silently passing all tests because in `ResolveBackupManifests` we explicitly setup the base backup layer in all the relevant slices before processing each incremental layer. The one case this was not okay in was encrypted backups. `FetchPreviousBackups` apart from doing the obvious would also read the encryption options from the base backups before reading the manifests from each layer. Now, because we stopped sending the base backup in the input slice, this step would fail since the method would go to the first incremental backup (instead of the base backup) and attempt to read the ENCRYPTION_INFO file. This file is only ever written to the base backup and so a `SHOW BACKUP` or a `RESTORE` of an encrypted backup would fail with a file not found error. In this diff we: 1) Fix `ResolveBackupManifests` by passing in the base backup + incremental backups to `FetchPreviousBackups`. 2) Make it the callers responsibility to pass in the fully hydrated encryption options before calling `FetchPreviousBackups` so that the method is *only* fetching backup manifests. Fixes: #91886 Release note (bug fix): fixes a bug that would cause `SHOW BACKUP` and `RESTORE` of encrypted incremental backups to fail
In #87311 we refactored `ResolveBackupManifests` to concurrently load manifests from base and incremental layers by calling `FetchPreviousBackups`. The mistake in that diff was to pass in only the incremental layers of the backup to `FetchPreviousBackup` instead of passing in the base backup + incremental layers. This was silently passing all tests because in `ResolveBackupManifests` we explicitly setup the base backup layer in all the relevant slices before processing each incremental layer. The one case this was not okay in was encrypted backups. `FetchPreviousBackups` apart from doing the obvious would also read the encryption options from the base backups before reading the manifests from each layer. Now, because we stopped sending the base backup in the input slice, this step would fail since the method would go to the first incremental backup (instead of the base backup) and attempt to read the ENCRYPTION_INFO file. This file is only ever written to the base backup and so a `SHOW BACKUP` or a `RESTORE` of an encrypted backup would fail with a file not found error. In this diff we: 1) Fix `ResolveBackupManifests` by passing in the base backup + incremental backups to `FetchPreviousBackups`. 2) Make it the callers responsibility to pass in the fully hydrated encryption options before calling `FetchPreviousBackups` so that the method is *only* fetching backup manifests. Fixes: #91886 Release note (bug fix): fixes a bug that would cause `SHOW BACKUP` and `RESTORE` of encrypted incremental backups to fail
91911: backupccl: fix bug in resolving encrypted backup manifests r=stevendanna a=adityamaru In #87311 we refactored `ResolveBackupManifests` to concurrently load manifests from base and incremental layers by calling `FetchPreviousBackups`. The mistake in that diff was to pass in only the incremental layers of the backup to `FetchPreviousBackup` instead of passing in the base backup + incremental layers. This was silently passing all tests because in `ResolveBackupManifests` we explicitly setup the base backup layer in all the relevant slices before processing each incremental layer. The one case this was not okay in was encrypted backups. `FetchPreviousBackups` apart from doing the obvious would also read the encryption options from the base backups before reading the manifests from each layer. Now, because we stopped sending the base backup in the input slice, this step would fail since the method would go to the first incremental backup (instead of the base backup) and attempt to read the ENCRYPTION_INFO file. This file is only ever written to the base backup and so a `SHOW BACKUP` or a `RESTORE` of an encrypted backup would fail with a file not found error. In this diff we: 1) Fix `ResolveBackupManifests` by passing in the base backup + incremental backups to `FetchPreviousBackups`. 2) Make it the callers responsibility to pass in the fully hydrated encryption options before calling `FetchPreviousBackups` so that the method is *only* fetching backup manifests. Fixes: #91886 Release note (bug fix): fixes a bug that would cause `SHOW BACKUP` and `RESTORE` of encrypted incremental backups to fail Co-authored-by: adityamaru <[email protected]>
This change is a targetted change to parallelize the loading
of backup manifests for each incremental layer of a backup.
This method is shared by both restore as well as
SHOW BACKUP
.Fixes: #87183
Release note: None
Release justification: low risk performance improvement required
for making
SHOW BACKUP
in the presence of many incremental layersmore performant