Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: add verify_backup_table_data option to RESTORE #86136

Merged
merged 2 commits into from
Aug 21, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Aug 15, 2022

Release note (sql change): this patch adds the verify_backup_table_data flag to
RESTORE. When the user passes this flag, along with the required schema_only
flag, a schema_only RESTORE will get run and all user data will get read from
external storage, checksummed, and disarded before getting written to disk.

This flag provides two additional validation steps that a regular schema_only
RESTORE and a SHOW BACKUP with check_files cannot provide: This RESTORE
verifies that all data can get read and rekeyed to the Restoring Cluster, and
that all data passes a checksum check.

Release justification: low risk, high impact change to improve restore
validation

@msbutler msbutler self-assigned this Aug 15, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler changed the title Butler refactor tenant rekey backupccl: add verify_backup_table_data option to RESTORE Aug 15, 2022
@msbutler msbutler requested review from rhu713 and dt August 15, 2022 14:23
@msbutler msbutler marked this pull request as ready for review August 15, 2022 14:23
@msbutler msbutler requested review from a team as code owners August 15, 2022 14:23
@msbutler msbutler requested a review from a team August 15, 2022 14:23
@msbutler msbutler requested review from a team as code owners August 15, 2022 14:23
@msbutler msbutler force-pushed the butler-refactor-tenant-rekey branch 2 times, most recently from 8b4579a to ca97c04 Compare August 18, 2022 15:09
@@ -1612,6 +1612,9 @@ type BackupRestoreTestingKnobs struct {
// testing. This is typically the bulk mem monitor if not
// specified here.
BackupMemMonitor *mon.BytesMonitor

// RecoverFromIterClosePanic prevents the node from panicing during ReadAsOfIterator.Close
RecoverFromIterPanic bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, we shouldn't need this if we add a no-panic option to pebble iter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planning to rebase on #86423 which will have the fix.

// sendIter sends a multiplexed iterator covering the currently accumulated files over the
// channel.
sendIter := func(iter storage.SimpleMVCCIterator, dirsToSend []cloud.ExternalStorage) error {
readAsOfIter := storage.NewReadAsOfIterator(iter, rd.spec.RestoreTime)

cleanup := func() {
if recoverFromIterPanic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, we shouldn't need this once we fix the pebble iterator.

pkg/ccl/backupccl/restore_data_processor.go Outdated Show resolved Hide resolved
return nil, nil, nil, err
}
if !backupCodec.TenantPrefix().Equal(p.ExecCfg().Codec.TenantPrefix()) {
// Ensure old processors fail if this is a previously unsupported restore of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this x-version code isn't needed anymore, but I don't remember if we rely on the zero rekey for anything else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I grepped for rekey.OldID == 0 ,poison-pill, and execinfrapb.TableRekey{} throughout the codebase and no other application turned up.

Given that all 22.1 processors know how to rekey, I think it's safe remove introducing the poison pill in restore_job.go. 22.2 processors still need to filter over the poison pill if a 22.1 processor planned the job.

God forbid if a RESTORE can run over 2 version upgrades.

pkg/ccl/backupccl/restore_job.go Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-refactor-tenant-rekey branch 2 times, most recently from 6c7b433 to febc9aa Compare August 19, 2022 22:48
This small refactor pushes tenant rekeying logic from the main restore_job
Resume() function into createImportingDescriptors.

Release note: None
Release note (sql change): this patch adds the verify_backup_table_data flag to
RESTORE. When the user passes this flag, along with the required schema_only
flag, a schema_only RESTORE will get run _and_ all user data will get read from
external storage, checksummed, and disarded before getting written to disk.

This flag provides two additional validation steps that a regular schema_only
RESTORE and a SHOW BACKUP with check_files cannot provide: This RESTORE
verifies that all data can get read and rekeyed to the Restoring Cluster, and
that all data passes a checksum check.

Release justification: low risk, high impact change to improve restore
validation
@msbutler msbutler force-pushed the butler-refactor-tenant-rekey branch from febc9aa to 58df28d Compare August 20, 2022 18:57
@msbutler
Copy link
Collaborator Author

bors r=dt

@craig
Copy link
Contributor

craig bot commented Aug 21, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants