Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: ignore all dropped descriptors during backup #76635

Merged
merged 1 commit into from
Feb 17, 2022

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Feb 16, 2022

Previously, descriptors that were resolved at EndTime
and were in a DROP state were not treated uniformly. While
we ignored table descriptors, we continued to backup database,
schema and type descrpitors.

This resulted in atleast two bugs:

  1. If a database descriptor was in a dropped state, and a new
    descriptor with the same name was created then a BACKUP DATABASE
    of the new database would fail during resolution.

  2. A cluster backup of the above state would succeed, but since it
    would include duplicate entries for the same name database, the restore
    of such a backup would fail.

This change unifies the behaviour by ignoring all DROP descriptors seen
by the backup at EndTime. A follow up PR will teach restore to ignore
all dropped descriptors so as to allow users with "corrupt" backups as
explained in 2) to be able to restore.

Informs: #76517

Release note (bug fix): Backup incorrectly backed up database, schema,
and type descriptors that were in a DROP state at the time the backup was
run. This bug resulted in the user being unable to backup and restore if
their cluster had dropped and public descriptors with colliding names.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru force-pushed the cluster-restore-dropped branch from 880da3b to c499e95 Compare February 16, 2022 21:06
Previously, descriptors that were resolved at `EndTime`
and were in a `DROP` state were not treated uniformly. While
we ignored table descriptors, we continued to backup database,
schema and type descrpitors.

This resulted in  atleast two bugs:
1) If a database descriptor was in a dropped state, and a new
descriptor with the same name was created then a BACKUP DATABASE
of the new database would fail during resolution.

2) A cluster backup of the above state would succeed, but since it
would include duplicate entries for the same name database, the restore
of such a backup would fail.

This change unifies the behaviour by ignoring all DROP descriptors seen
by the backup at `EndTime`. A follow up PR will teach restore to ignore
all dropped descriptors so as to allow users with "corrupt" backups as
explained in 2) to be able to restore.

Release note (bug fix): Backup incorrectly backed up database, schema,
and type descriptors that were in a DROP state at the time the backup was
run. This bug resulted in the user being unable to backup and restore if
their cluster had dropped and public descriptors with colliding names.
@adityamaru adityamaru force-pushed the cluster-restore-dropped branch from c499e95 to 451274b Compare February 17, 2022 13:18
@adityamaru adityamaru changed the title [WIP,DNM] backupccl: ignore dropped database descriptors backupccl: ignore all dropped descriptors during backup Feb 17, 2022
@adityamaru adityamaru marked this pull request as ready for review February 17, 2022 13:19
@adityamaru adityamaru requested review from a team, msbutler, ajwerner and dt and removed request for a team February 17, 2022 13:19
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @msbutler)


pkg/ccl/backupccl/backupresolver/targets.go, line 341 at r1 (raw file):

		if _, ok := alreadyRequestedDBs[dbID]; !ok {
			desc := r.DescByID[dbID]
			// Verify that the database is in the correct state.

Note that we do this rather than just filtering because we already resolved this database by ID above?

Code quote:

te.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ajwerner
Copy link
Contributor

bors r=ajwerner,stevendanna

@craig
Copy link
Contributor

craig bot commented Feb 17, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 17, 2022

Build succeeded:

@craig craig bot merged commit a6ef4b0 into cockroachdb:master Feb 17, 2022
craig bot pushed a commit that referenced this pull request Feb 18, 2022
76715: schemachanger,backupccl: support for backup and restore mid-change r=ajwerner a=ajwerner

### schemachanger,backupccl: support for backup and restore mid-change

This commit touches the schema changer to encode some more metadata about the
state of the schema change into the descriptors in order to better facilitate
synthesizing a job at restore time.

The major contribution, then, is to support the declarative schema changer in
restore. Along the way, it picks up some catalog niceties like batches
descriptor retreival and the nstree.Catalog.

Finally, the code is tested with a suite of datadriven tests which run schema
changes and then take the state and BACKUP/RESTORE it at each of the possible
stages (including all of the rollback stages) and makes sure that the right
outcome ultimately happens.

Note that this is currently just tested for database-level BACKUP/RESTORE
and that more testing is planned as follow-up.

### sql/schemachanger: move where we remove namespace entries to be earlier

This should really use the namespace element, but I needed this fix
to not break some other tests I'm writing.

### backupccl: don't back up dropped databases or schemas

Hack to deal with bugs that Aditya is fixing concurrently in #76635.

### sql/schemachanger: add more end-to-end testing

This reworking is a precursor to adding backup/restore testing. Pause testing
is also extremely valuable. It caught a bug when we resume after having
finished a backfill but before we marked it as complete. This was bad
for two reasons:

1) We weren't catching the panic on those goroutines (lesson learned, again)
2) We weren't detecting the case where we were done.

Fixes #73071 

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@adityamaru adityamaru deleted the cluster-restore-dropped branch February 21, 2022 14:55
@adityamaru
Copy link
Contributor Author

blathers backport 21.2

@blathers-crl
Copy link

blathers-crl bot commented Aug 3, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 451274b to blathers/backport-release-21.2-76635: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants