Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: disallow restore of backups older than the minimum supported binary version #93804

Closed
wants to merge 15 commits into from

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Dec 16, 2022

As outlined in https://www.cockroachlabs.com/docs/v22.2/restoring-backups-across-versions.html,
we do not support restoring backups older than one major version. This
patch enforces this in restore code by ensuring that the manifest's version is greater
than equal to the minimum supported binary version of the cluster.

Release note (sql change): Backups older than one major version
from the cluster's current active version are not restoreable as per
the policy in https://www.cockroachlabs.com/docs/v22.2/restoring-backups-across-versions.html.

Fixes: #94295

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru
Copy link
Contributor Author

Lots of tests are going to break, I want CI to tell me what fixtures I need to regenerate.

@adityamaru adityamaru force-pushed the cleanup-rstore branch 5 times, most recently from 509f705 to e74c6bf Compare December 27, 2022 07:12
@adityamaru adityamaru changed the title backupccl: disallow restore of backups greater than one major version old backupccl: disallow restore of backups older than the minimum supported binary version Dec 27, 2022
@adityamaru
Copy link
Contributor Author

adityamaru commented Dec 27, 2022

I have one TODO to resolve 743d5e1#diff-87e52e674b19b4c9ef048ee88b0a6b5feec243b869bb266a5551826f6997b0c8R72 and ensure our restore roachtests are up to date with this policy, but this should be ready for a look.

I will squash all the commits once I get an LGTM! I'm primarily looking for reviewers to tell me if I've deleted a test I shouldn't have.

@adityamaru adityamaru marked this pull request as ready for review December 27, 2022 07:16
@adityamaru adityamaru requested review from a team as code owners December 27, 2022 07:16
@adityamaru adityamaru requested review from msbutler, herkolategan, smg260, stevendanna and dt and removed request for a team and smg260 December 27, 2022 07:16
…ed binary version

As outlined in https://www.cockroachlabs.com/docs/v22.2/restoring-backups-across-versions.html,
we do not support restoring backups older than one major version. This
patch enforces this in restore code by ensuring that the manifest's version is greater
than equal to the minimum supported binary version of the cluster.

Release note (sql change): Backups older than one major version
from the cluster's current active version are not restoreable as per
the policy in https://www.cockroachlabs.com/docs/v22.2/restoring-backups-across-versions.html.

Fixes: cockroachdb#94295
Interleaved tables were removed >1 major version in
the past and so a backup restoreable in 23.1 cannot contain
interleaved tables.

Release note: None
The xDBRef and duplicate-db exceptional backups
can no longer be restored since they were generated
>1 major version in the past. This change does add
a test inspired by the duplicate-db test to expand
coverage.

Release note: None
This change removes the test that was added as part of
were correctly upgraded on restore to 20.x+ versions.
Backups taken one major release ago will no longer have
these old style fks and so this test serves better as a
regular test on this branch.

Release note: None
This change removes old cluster restore fixtures, and replaces them
with a cluster backup from 22.2.0. It also removes a 20.1 fixture that
tested restore of a zoneconfig privilege that was overwritten as the usage
privilege in 20.2+ versions. This test has been migrated to a regular
test instead.

Release note: None
This change removes an old MR backup that was being
restored and tested against. There were no instructions
on how to regenerate the fixture in cockroachdb#69725 so it is being
deleted.

Release note: None
Prior to 22.2 the public schema was not backed by
a descriptor. These fixtures tested that a restore
of a pre 22.2 backup would remap the synthetic public
schema to a descriptor backed public schema. Since the
only restoreable backups on this branch will be from
22.2 onwards, we should never see a synthetic public schema
in a restoreable backup.

Release note: None
The backup fixtures in this test were generated on 20.2 where
cockroachdb#62564 existed.
Given this was >1 major version in the past we do not expect to
see a reoccurrence of the bug in 22.2+ backups that are restoreable
in this release.

Release note: None
This fixture captured a bug in 20.2.7 that has since
been fixed. This change removes the fixture and moves it
to a regular regression test.

Release note: None
These fixtures were generate on older version of
Cockroach that are no longer restoreable. The test
has been migrated to a regular datadriven test.

Release note: None
This change regenerate the sequences backup fixture on 22.2.0.
While there have been no changes to the representation of sequences
between 22.2 and 23.1 it seemed useful to keep a cross version
backup test of this nature.

Release note: None
All users in 22.2+ clusters will have IDs and so
will the backups taken on these clusters. Thus, there
is no need to keep around the cross version test
that tests the custom restore func that injects a
user ID.

Release note: None
This change regenerates the fixtures for testing
restores of backups that have been taken before or after
the backfill portion of a schemachange. Note these tests
rely on the old schemachanger and do not increase our test
coverage with the new declarative schema changer.

Release note: None
This test was testing a bug that was discovered during
backups in pre-22.2 versions cockroachdb#88042.
Since this bug does not exist on 22.2+ versions
this test will not encounter the expected corruption
on backups that are restoreable in this cluster.

Release note: None
This change adds an env variable that will be set by the
job running restore roachtests until we have updated all their
fixtures to be 22.2.0+.

Release note: None
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read all of the test code updates, but overall this seems reasonable if we still want to enforce this in code. Left one small question.

Comment on lines +1530 to +1532
return errors.WithHint(errors.Newf("the backup is from a version older than our "+
"minimum restoreable version %s", minimumRestorableVersion),
"refer to our documentation about restoring across versions: https://www.cockroachlabs.com/docs/v22.2/restoring-backups-across-versions.html")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the v.Major == 0 case imply we never recorded a version in the manifest? If so, does it definitely need to happen before the currentVersion.Less check since current version will always be greater? that would help keep the two related checks next to each other.

msbutler added a commit to msbutler/cockroach that referenced this pull request Mar 13, 2023
…mework

The restore/nodeshutdown tests have been using a very old workload that will
not be restorable when cockroachdb#93804 lands. This patch changes the
restore/nodeshutdown workload to a 80GB tpce restore and moves the tests to run
on aws instead of gcp.

Release note: None

Epic: None
@adityamaru
Copy link
Contributor Author

Closing in favour of #98597.

@adityamaru adityamaru closed this Mar 14, 2023
craig bot pushed a commit that referenced this pull request Mar 14, 2023
97138: ui: add error code to stmt and txn insights details pages r=gtr a=gtr

Part of: #87785.

Previously, the stmt and txn insights details pages did not show any
further information for failed executions. This commit adds an "error
code" column to the insights table for a failed execution in the stmt
and txn insights details pages. Additionally, a "status" column was
added to the stmt and txn workload insights tables which is either
"Completed" or "Failed".

Future work involves adding the error message string in addition to the
error code but it needs to be redacted first. Additionally, the txn
status is missing the implementation of a "Cancelled" status.

Note to reviewers: only consider the second commit, as the first is 
required to get the txn status.

- Loom [demo](https://www.loom.com/share/e82b97ff9f034d82b98640170eb54408).

Release note (ui change): Adds error code column to the insights table
for a failed execution in the stmt and txn insights details page. Adds
status column to the stmt and txn workload insights tables.

98410: cluster-ui: tenants use sqlstats-supplied regions r=matthewtodd a=matthewtodd

Fixes #98056.

As of #95449, the SQL Activity pages in the DB Console can draw regions information directly from the sqlstats tables, rather than having to translate node IDs to regions on page load.
    
Here, we make that switch, but for non-system tenants only, because:
    
1. The ephemeral nature of serverless nodes made this view-time mapping especially problematic in that context. (See further notes in #95449.)
    
2. The system-tenant views also include KV node IDs in a special Regions/Nodes column, which we are unable to recreate given the backend storage structure. (Future design work might suggest removing these node IDs altogether, for a unified UI.)

# Screenshots!
## Statements, with and without regions filter
<img width="1372" alt="statements" src="https://user-images.githubusercontent.com/5261/225033247-739df90a-9173-4aab-a666-a61a1ceeb579.png">
<img width="1372" alt="statements - filtered" src="https://user-images.githubusercontent.com/5261/225033271-1c0d0f82-3dd4-48ea-bdef-11f19af97a85.png">

## Statement details
<img width="1372" alt="statement details" src="https://user-images.githubusercontent.com/5261/225033338-6dff4a6e-a4a3-48c6-863a-84f1375b0a61.png">

## Transactions, with and without regions filter
<img width="1372" alt="transactions" src="https://user-images.githubusercontent.com/5261/225033366-65f44e95-3549-47cc-b0f2-67ad48a1a1fa.png">
<img width="1372" alt="transactions - filtered" src="https://user-images.githubusercontent.com/5261/225033391-50b9a2dc-e9a1-457b-84b1-837426eba35e.png">

## Transaction details
<img width="1372" alt="transaction details" src="https://user-images.githubusercontent.com/5261/225033505-3fdeceef-35dc-4e06-af25-ab4d0c53518f.png">

Release note: None

98464: jobs,upgrades: add migration to backfill job_info table r=dt a=adityamaru

This change adds a migration and corresponding cluster version
after which every job entry in the system.jobs table will have its
Payload and Progress written to two rows in the system.job_info table.

Informs: #97762

Release note: None

98510: backupccl: update restore/nodeshutdown tests to use new roachtest framework r=adityamaru a=msbutler

The restore/nodeshutdown tests have been using a very old workload that will not be restorable when #93804 lands. This patch changes the restore/nodeshutdown workload to a 80GB tpce restore and moves the tests to run on aws instead of gcp.

Release note: None

Epic: None

98579: upgrade/upgrades: skip TestUpgradeSchemaChangerElements r=smg260 a=smg260

Refs: #98062

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
Epic: None

Co-authored-by: gtr <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Miral Gadani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: reject restores of backups greater than the clusters minimum supported binary version
3 participants