-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrades: try to repair catalog corruptions during upgrade #105832
Conversation
307a1cd
to
3731817
Compare
3731817
to
cc16179
Compare
282d0f6
to
c5c4243
Compare
c5c4243
to
9558d44
Compare
9558d44
to
3b28c87
Compare
This is ready for review, I'll reconcile the conflicts around hardcoded OIDs before borsing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 6 of 6 files at r2, 16 of 16 files at r3, 4 of 4 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt, @knz, @postamar, and @rafiss)
pkg/sql/sem/builtins/builtins.go
line 5226 at r4 (raw file):
SELECT CASE corruption WHEN 'descriptor'
Maybe document the corruption types in the code, just to make things a bit cleaner.
pkg/upgrade/upgrades/first_upgrade.go
line 139 at r5 (raw file):
} else if hasRows { // Attempt to repair catalog corruptions. log.Info(ctx, "auto-repairing catalog corruptions detected during upgrade attempt")
From a diagnosability viewpoint, do you think logging which descriptors/corruption type is worth logging? It might be helpful for customer scenarios where we didn't cover something.
pkg/sql/logictest/testdata/logic_test/schema_repair
line 284 at r4 (raw file):
SELECT * FROM "".crdb_internal.kv_repairable_catalog_corruptions ORDER BY 1,2,3,4 ---- 104 105 _corrupt_backref_typ 114 descriptor
Nice!
3b28c87
to
f32a362
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @fqazi, @knz, and @rafiss)
pkg/sql/sem/builtins/builtins.go
line 5226 at r4 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Maybe document the corruption types in the code, just to make things a bit cleaner.
Done, in the vtable definition mostly.
pkg/upgrade/upgrades/first_upgrade.go
line 139 at r5 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
From a diagnosability viewpoint, do you think logging which descriptors/corruption type is worth logging? It might be helpful for customer scenarios where we didn't cover something.
Good question. I'm worried there's a real possibility of there being too many of them. In fact, I should perform the repair in batches!
pkg/sql/logictest/testdata/logic_test/schema_repair
line 284 at r4 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Nice!
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @knz, @postamar, and @rafiss)
pkg/upgrade/upgrades/first_upgrade.go
line 139 at r5 (raw file):
Previously, postamar (Marius Posta) wrote…
Good question. I'm worried there's a real possibility of there being too many of them. In fact, I should perform the repair in batches!
Batching here makes tons of sense. I guess we could limit that type of logging into a small number or the First N of these, but I'll leave that up to you.
Previously, function descriptors were not validated when querying this virtual table. This patch fixes this. Informs cockroachdb#104418. Release note: None
Descriptor repairs which can be done in post-deserialization should always be performed then and not elsewhere, otherwise we jeopardize the effectiveness of the recently-added catalog corruption repair builtin. This has no impact otherwise. Release note: None
This virtual view lists system.descriptor and system.namespace entries which can be repaired using crdb_internal.unsafe_* and crdb_internal.repaired_descriptor builtin functions. Conceptually, this view can be thought of as a projection from crdb_internal.invalid_objects: anything that shows up in this view should also show up in that table. Informs cockroachdb#104425. Release note: None
This new builtin is meant to be used with the recently-added crdb_internal.kv_repairable_catalog_corruptions virtual table. Typically, SELECT crdb_internal.repair_catalog_corruption(id, corruption) FROM "".crdb_internal.kv_repairable_catalog_corruptions will ensure that a subsequent query of the vtable will return empty. Informs cockroachdb#104425. Release note: None
This change enhances the recently-added upgrade precondition check which checks for catalog corruptions by trying to repair these automatically when possible. Fixes: cockroachdb#104425 Fixes: cockroachdb#85265 Release note (general change): upgrading the cluster version to a new release will not only check for descriptor- and other catalog corruptions but will attempt to repair some of them on a best-effort basis. This should seamlessly get rid of all longstanding descriptor back-reference corruptions, which typically don't manifest themselves until a schema change or an upgrade are performed.
f32a362
to
04bb9e7
Compare
I'll leave that for a later time, I think. bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from d8a2197 to blathers/backport-release-23.1-105832: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This change enhances the recently-added upgrade precondition check which
checks for catalog corruptions by trying to repair these automatically
when possible.
Fixes: #104425
Fixes: #85265
Release note (general change): upgrading the cluster version to a new
release will not only check for descriptor- and other catalog
corruptions but will attempt to repair some of them on a best-effort
basis. This should seamlessly get rid of all longstanding descriptor
back-reference corruptions, which typically don't manifest themselves
until a schema change or an upgrade are performed.