-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: mark type descs as dropped if there is a failure in restore #62278
Conversation
5defdf1
to
9900c91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! LGTM with a minor testing comment.
// Back up the database, drop it, and restore into it. | ||
sqlDB.Exec(t, `BACKUP DATABASE db TO 'nodelocal://0/test/'`) | ||
sqlDB.Exec(t, `DROP DATABASE db`) | ||
sqlDB.ExpectErr(t, "boom", `RESTORE DATABASE db FROM 'nodelocal://0/test/'`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an assertion that the relevant descriptors are in the DROP state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, they're only written in the DROP state for the purpose of the desc collection, we still have to overwrite this and delete the descriptor entry from system.descriptor (as before).
Not quite sure how to test the descriptor entry has been removed successfully, as the ID is unknown and the namespace entry has been removed as well. Are there any existing tests I could take inspiration from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this sort of thing should have been caught with testing at a different level. Like for all of the tests where we restore from real backups, it feels like we should inject failures at various stages and make sure that those all roll back correctly. For another PR.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @pbardea)
pkg/ccl/backupccl/backup_test.go, line 6849 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
To clarify, they're only written in the DROP state for the purpose of the desc collection, we still have to overwrite this and delete the descriptor entry from system.descriptor (as before).
Not quite sure how to test the descriptor entry has been removed successfully, as the ID is unknown and the namespace entry has been removed as well. Are there any existing tests I could take inspiration from?
Can you just check that crdb_internal.invalid_objects
is empty?
Maybe also validate that without your change that it isn't.
Previously, when cleaning up type descriptors from a failed restore, we would directly delete the system.descriptor entry. This is bad because writing directly to the system.descriptor table means we bypass the descriptor collection, which doesn't know these descriptors have been dropped. The descriptor collection validates all uncommitted descriptors before writing them. As the descriptor collection doesn't know type descriptors have been dropped, validating cross references was always bound to fail. Put another way, if a restore from a backup which contained user defined types failed for any reason, we were bound to require manual cleanup. This patch fixes this problem by going through the descriptor collection and writing the descriptor in DROPPED state in addition to deleting the system.descriptor entry. Release note (bug fix): Fixed a bug where a failed restore from a backup including user defined types would require manual cleanup.
9900c91
to
ee85a1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @pbardea)
pkg/ccl/backupccl/backup_test.go, line 6849 at r1 (raw file):
Previously, ajwerner wrote…
Can you just check that
crdb_internal.invalid_objects
is empty?Maybe also validate that without your change that it isn't.
crdb_internal.invalid_objects
was empty without my changes as well because when the restore job failed it left behind all descriptor entries (not just the type descriptor), so the descriptor could be validated. I realized system.namespace
is fair game though, so that's what I'm doing now.
TFTRs! bors r=ajwerner,pbardea |
Build succeeded: |
Previously, when cleaning up type descriptors from a failed restore, we
would directly delete the system.descriptor entry. This is bad because
writing directly to the system.descriptor table means we bypass the
descriptor collection, which doesn't know these descriptors have been
dropped.
The descriptor collection validates all uncommitted descriptors before
writing them. As the descriptor collection doesn't know type
descriptors have been dropped, validating cross references was always
bound to fail. Put another way, if a restore from a backup which
contained user defined types failed for any reason, we were bound to
require manual cleanup.
This patch fixes this problem by going through the descriptor
collection and writing the descriptor in DROPPED state in addition to
deleting the system.descriptor entry.
Release note (bug fix): Fixed a bug where a failed restore from a
backup including user defined types would require manual cleanup.