-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: add verify_backup_table_data option to RESTORE #86136
backupccl: add verify_backup_table_data option to RESTORE #86136
Conversation
8b4579a
to
ca97c04
Compare
@@ -1612,6 +1612,9 @@ type BackupRestoreTestingKnobs struct { | |||
// testing. This is typically the bulk mem monitor if not | |||
// specified here. | |||
BackupMemMonitor *mon.BytesMonitor | |||
|
|||
// RecoverFromIterClosePanic prevents the node from panicing during ReadAsOfIterator.Close | |||
RecoverFromIterPanic bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, we shouldn't need this if we add a no-panic option to pebble iter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planning to rebase on #86423 which will have the fix.
// sendIter sends a multiplexed iterator covering the currently accumulated files over the | ||
// channel. | ||
sendIter := func(iter storage.SimpleMVCCIterator, dirsToSend []cloud.ExternalStorage) error { | ||
readAsOfIter := storage.NewReadAsOfIterator(iter, rd.spec.RestoreTime) | ||
|
||
cleanup := func() { | ||
if recoverFromIterPanic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, we shouldn't need this once we fix the pebble iterator.
return nil, nil, nil, err | ||
} | ||
if !backupCodec.TenantPrefix().Equal(p.ExecCfg().Codec.TenantPrefix()) { | ||
// Ensure old processors fail if this is a previously unsupported restore of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this x-version code isn't needed anymore, but I don't remember if we rely on the zero rekey for anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grepped for rekey.OldID == 0
,poison-pill
, and execinfrapb.TableRekey{}
throughout the codebase and no other application turned up.
Given that all 22.1 processors know how to rekey, I think it's safe remove introducing the poison pill in restore_job.go
. 22.2 processors still need to filter over the poison pill if a 22.1 processor planned the job.
God forbid if a RESTORE can run over 2 version upgrades.
6c7b433
to
febc9aa
Compare
This small refactor pushes tenant rekeying logic from the main restore_job Resume() function into createImportingDescriptors. Release note: None
Release note (sql change): this patch adds the verify_backup_table_data flag to RESTORE. When the user passes this flag, along with the required schema_only flag, a schema_only RESTORE will get run _and_ all user data will get read from external storage, checksummed, and disarded before getting written to disk. This flag provides two additional validation steps that a regular schema_only RESTORE and a SHOW BACKUP with check_files cannot provide: This RESTORE verifies that all data can get read and rekeyed to the Restoring Cluster, and that all data passes a checksum check. Release justification: low risk, high impact change to improve restore validation
febc9aa
to
58df28d
Compare
bors r=dt |
Build succeeded: |
Release note (sql change): this patch adds the verify_backup_table_data flag to
RESTORE. When the user passes this flag, along with the required schema_only
flag, a schema_only RESTORE will get run and all user data will get read from
external storage, checksummed, and disarded before getting written to disk.
This flag provides two additional validation steps that a regular schema_only
RESTORE and a SHOW BACKUP with check_files cannot provide: This RESTORE
verifies that all data can get read and rekeyed to the Restoring Cluster, and
that all data passes a checksum check.
Release justification: low risk, high impact change to improve restore
validation