Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: upgrades: try to repair catalog corruptions during upgrade #106217

Merged
merged 5 commits into from
Jul 6, 2023

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Jul 5, 2023

Backport 5/5 commits from #105832.

/cc @cockroachdb/release

Release justification: low-risk high-reward automatic descriptor corruption repair


This change enhances the recently-added upgrade precondition check which
checks for catalog corruptions by trying to repair these automatically
when possible.

Fixes: #104425
Fixes: #85265

Release note (general change): upgrading the cluster version to a new
release will not only check for descriptor- and other catalog
corruptions but will attempt to repair some of them on a best-effort
basis. This should seamlessly get rid of all longstanding descriptor
back-reference corruptions, which typically don't manifest themselves
until a schema change or an upgrade are performed.

@blathers-crl
Copy link

blathers-crl bot commented Jul 5, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar force-pushed the backport23.1-105832 branch from 85d5621 to ca1fd55 Compare July 5, 2023 21:04
Marius Posta added 5 commits July 5, 2023 20:18
Previously, function descriptors were not validated when querying this
virtual table. This patch fixes this.

Informs cockroachdb#104418.

Release note: None
Descriptor repairs which can be done in post-deserialization should
always be performed then and not elsewhere, otherwise we jeopardize the
effectiveness of the recently-added catalog corruption repair builtin.

This has no impact otherwise.

Release note: None
This virtual view lists system.descriptor and system.namespace entries
which can be repaired using crdb_internal.unsafe_* and
crdb_internal.repaired_descriptor builtin functions.

Conceptually, this view can be thought of as a projection from
crdb_internal.invalid_objects: anything that shows up in this view
should also show up in that table.

Informs cockroachdb#104425.

Release note: None
This new builtin is meant to be used with the recently-added
crdb_internal.kv_repairable_catalog_corruptions virtual table.

Typically,

    SELECT crdb_internal.repair_catalog_corruption(id, corruption)
    FROM "".crdb_internal.kv_repairable_catalog_corruptions

will ensure that a subsequent query of the vtable will return empty.

Informs cockroachdb#104425.

Release note: None
This change enhances the recently-added upgrade precondition check which
checks for catalog corruptions by trying to repair these automatically
when possible.

Fixes: cockroachdb#104425
Fixes: cockroachdb#85265

Release note (general change): upgrading the cluster version to a new
release will not only check for descriptor- and other catalog
corruptions but will attempt to repair some of them on a best-effort
basis. This should seamlessly get rid of all longstanding descriptor
back-reference corruptions, which typically don't manifest themselves
until a schema change or an upgrade are performed.
@postamar postamar force-pushed the backport23.1-105832 branch from ca1fd55 to 7a6036d Compare July 6, 2023 00:18
@postamar postamar marked this pull request as ready for review July 6, 2023 01:43
@postamar postamar requested a review from a team as a code owner July 6, 2023 01:43
@postamar postamar requested a review from a team July 6, 2023 01:43
@postamar postamar requested review from a team as code owners July 6, 2023 01:43
@postamar postamar requested review from nkodali and mgartner and removed request for a team, nkodali and mgartner July 6, 2023 01:43
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment, think some extra stuff was picked up in the logic test backport. Though it looks harmless and related to sorting.

Reviewed 1 of 1 files at r1, 6 of 6 files at r2, 16 of 16 files at r3, 4 of 4 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/logictest/testdata/logic_test/pg_builtins line 652 at r3 (raw file):

option_name  option_value

query TT colnames,nosort

What are these linked to?

@postamar
Copy link
Contributor Author

postamar commented Jul 6, 2023

Good of you to notice! Someone added sorting of those result sets but forgot or omitted to backport that change. I think it's harmless too.

@postamar postamar merged commit 4c7b13e into cockroachdb:release-23.1 Jul 6, 2023
@postamar postamar deleted the backport23.1-105832 branch July 6, 2023 13:59
@postamar
Copy link
Contributor Author

postamar commented Jul 6, 2023

Thanks for the quick review Faizan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants