Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: stop including OFFLINE tables in backup #42606

Merged
merged 1 commit into from
Nov 20, 2019

Conversation

pbardea
Copy link
Contributor

@pbardea pbardea commented Nov 20, 2019

In 19.2, the OFFLINE table descriptor state was added. This state
indicates that the table is generally not visible to users as it is
being populated by the database (via an in-progress RESTORE or IMPORT).
However, these tables are currently included in backups.

When performing a BACKUP while a RESTORE or IMPORT is being executed,
there will be tables that exist in an OFFLINE state. These should not be
included in a BACKUP as when they are restored they will appear to be in
an intermediary and potentially inconsistent state.

Consider an OFFLINE table bank.pause. This table could be included in
a backup either via directly naming the table: BACKUP bank.pause TO ... or via table expansion: BACKUP bank.* TO .... In the first case,
an error should be returned as the table should not be directly visible
to the user. In the later case, OFFLINE tables should be silently
ignored since the OFFLINE table was not explicitly requested by the
user. The remaining PUBLIC tables in the database should be expanded.

Note: this commit also changes the name resolution logic for
changefeeds in the same way that it applies to backup.

To address this, the BACKUP and RESTORE name resolution logic was
modified to support filtering OFFLINE tables.

Release note (bug fix): Stop including tables that are being restored or
imported as valid targets in backups and changefeeds.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pbardea pbardea force-pushed the offline-table-expansion-fix branch 3 times, most recently from b74269b to 175e7e1 Compare November 20, 2019 20:31
@pbardea pbardea requested a review from dt November 20, 2019 20:31
@pbardea pbardea force-pushed the offline-table-expansion-fix branch from 175e7e1 to a0880ca Compare November 20, 2019 20:39
@pbardea
Copy link
Contributor Author

pbardea commented Nov 20, 2019

Wanted to call out that I added higher-level tests where a RESTORE is paused and it is checked that the restoring tables are not included in a backup. However, the logic around this test is currently skipped due to flakiness around managing job state (see #40639).

Unit tests should sufficiently tests this functionality for now though.

@pbardea
Copy link
Contributor Author

pbardea commented Nov 20, 2019

Also, this should be backported.

@dt dt requested review from ajwerner, aayushshah15 and dt and removed request for dt November 20, 2019 20:48
@dt
Copy link
Member

dt commented Nov 20, 2019

cc @aayushshah15 for impact on CDC.

@pbardea pbardea force-pushed the offline-table-expansion-fix branch 2 times, most recently from 1a09ba7 to db97a89 Compare November 20, 2019 21:27
@pbardea
Copy link
Contributor Author

pbardea commented Nov 20, 2019

I updated the PR to always filter out OFFLINE tables since RESTORE verifies it's targets against the descriptors stored in the backup - not the descriptors it will restore into.

Also CDC interacts with this here: https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/changefeedccl/changefeed_stmt.go#L198.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

@ajwerner
Copy link
Contributor

For CDC I think this change works. You shouldn't be able to run a changefeed on a table which is OFFLINE.

@aayushshah15
Copy link
Contributor

:lgtm:

In 19.2, the OFFLINE table descriptor state was added. This state
indicates that the table is generally not visible to users as it is
being populated by the database (via an in-progress RESTORE or IMPORT).
However, these tables are currently included in backups.

When performing a BACKUP while a RESTORE or IMPORT is being executed,
there will be tables that exist in an OFFLINE state. These should not be
included in a BACKUP as when they are restored they will appear to be in
an intermediary and potentially inconsistent state.

Consider an OFFLINE table `bank.pause`. This table could be included in
a backup either via directly naming the table: `BACKUP bank.pause TO
...` or via table expansion: `BACKUP bank.* TO ...`. In the first case,
an error should be returned as the table should not be directly visible
to the user. In the later case, OFFLINE tables should be silently
ignored since the OFFLINE table was not explicitly requested by the
user. The remaining PUBLIC tables in the database should be expanded.

Note: this commit also changes the name resolution logic for
changefeeds in the same way that it applies to backup.

To address this, the BACKUP and RESTORE name resolution logic was
modified to support filtering OFFLINE tables.

Release note (bug fix): Stop including tables that are being restored or
imported as valid targets in backups and changefeeds.
@pbardea pbardea force-pushed the offline-table-expansion-fix branch from db97a89 to 56fec47 Compare November 20, 2019 21:57
@pbardea
Copy link
Contributor Author

pbardea commented Nov 20, 2019

TFTRs!
bors r=aayushshah15,ajwerner,dt

craig bot pushed a commit that referenced this pull request Nov 20, 2019
42606: backupccl: stop including OFFLINE tables in backup r=aayushshah15,ajwerner,dt a=pbardea

In 19.2, the OFFLINE table descriptor state was added. This state
indicates that the table is generally not visible to users as it is
being populated by the database (via an in-progress RESTORE or IMPORT).
However, these tables are currently included in backups.

When performing a BACKUP while a RESTORE or IMPORT is being executed,
there will be tables that exist in an OFFLINE state. These should not be
included in a BACKUP as when they are restored they will appear to be in
an intermediary and potentially inconsistent state.

Consider an OFFLINE table `bank.pause`. This table could be included in
a backup either via directly naming the table: `BACKUP bank.pause TO
...` or via table expansion: `BACKUP bank.* TO ...`. In the first case,
an error should be returned as the table should not be directly visible
to the user. In the later case, OFFLINE tables should be silently
ignored since the OFFLINE table was not explicitly requested by the
user. The remaining PUBLIC tables in the database should be expanded.

Note: this commit also changes the name resolution logic for
changefeeds in the same way that it applies to backup.

To address this, the BACKUP and RESTORE name resolution logic was
modified to support filtering OFFLINE tables.

Release note (bug fix): Stop including tables that are being restored or
imported as valid targets in backups and changefeeds.

Co-authored-by: Paul Bardea <[email protected]>
@craig
Copy link
Contributor

craig bot commented Nov 20, 2019

Build succeeded

@craig craig bot merged commit 56fec47 into cockroachdb:master Nov 20, 2019
@pbardea pbardea deleted the offline-table-expansion-fix branch November 22, 2019 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants