Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: cluster backups include offline tables #88043

Closed
msbutler opened this issue Sep 16, 2022 · 2 comments
Closed

backupccl: cluster backups include offline tables #88043

msbutler opened this issue Sep 16, 2022 · 2 comments
Assignees
Labels
A-disaster-recovery branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-disaster-recovery

Comments

@msbutler
Copy link
Collaborator

msbutler commented Sep 16, 2022

In pre 22.2 releases, it was assumed that cluster backups excluded offline tables, but in fact they do include them. This misconception can lead to corrupt data in restore. Consider:

t0: begin IMPORT on foo
t1: conduct cluster backup - captures foo's pre-import state and some importing data
t2: rollback import foo via non-mvcc clear range
t3: conduct incremental backup
t4: restore foo to latest time

  • b/c of the non-mvcc clear range, the incremental backup is completely naive to the rollback, thus, the importing data will get restored.

For cluster backups without revision history, this bug could be fixed by simply excluding the table from the backup. For cluster backups with revision history, a more complex fix is necessary, as outlined in #88042

Jira issue: CRDB-19658

@msbutler msbutler added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery T-disaster-recovery labels Sep 16, 2022
@msbutler msbutler self-assigned this Sep 16, 2022
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 19, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each table we want to restore
   - identify the last time, l, the table was re-introduced, using the manifests
   - dont restore the table using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Fixes cockroachdb#87305

Release justification: bug fix

Release note: none
@blathers-crl
Copy link

blathers-crl bot commented Sep 20, 2022

Hi @msbutler, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@msbutler msbutler added the branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 label Sep 20, 2022
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 21, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note: fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 21, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
craig bot pushed a commit that referenced this issue Sep 22, 2022
87312: backupccl: elide spans from backups that were subsequently reintroduced r=dt,adityamaru a=msbutler

Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See #87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (#88042, #88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs #87305

Release justification: bug fix

Release note: fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198

88384: server: return elapsed time for active executions r=xinhaoz a=xinhaoz

Previously, we calculated the time elapsed for an active stmt or txn based on the start time returned from the server and the time the response was last received. Calculating this value on the client is not reliable and can lead to negative values when the server time is slightly ahead. This commit fixes this issue by including the time elapsed as part of the active txns and stmts response.

Release note (bug fix): time elapsed for active txns and stmts is never negative.

88449: kvserver: fix flaky test for consistency checks r=erikgrinaker a=pavelkalinnikov

There was a race in selecting between a canceled context.Done and 0-time timer.

Fixes #88133

Release justification: flaky test fix
Release note: None

Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Sep 22, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See #87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (#88042, #88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs #87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 23, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 29, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 29, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 30, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Oct 4, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Oct 4, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler added a commit to msbutler/cockroach that referenced this issue Oct 12, 2022
Currently RESTORE may restore invalid backup data from a backed up table that
underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation.

This patch ensures that RESTORE elides older backup data that were deleted via
a non-MVCC operation. Because incremental backups always reintroduce spans
(i.e. backs them up from timestamp 0) that may have undergone a non-mvcc
operation, restore can identify restoring spans with potentially corrupt data
in the backup chain and only ingest the spans' reintroduced data to any system
time, without the corrupt data.

Here's the basic impliemenation in Restore:
- For each span we want to restore
   - identify the last time, l, the span was introduced, using the manifests
   - dont restore the span using a backup if backup.EndTime < l

This implementation rests on the following assumption: the input spans for each
restoration flow (created in createImportingDescriptors) and the
restoreSpanEntries (created by makeSimpleImportSpans) do not span across
multiple tables. Given this assumption, makeSimpleImportSpans skips adding
files from a backups for a given input span that was reintroduced in a
subsequent backup.

It's worth noting that all significant refactoring occurs on code run by
the restore coordinator; therefore, no special care needs to be taken for
mixed / cross version backups. In other words, if the coordinator has updated,
the cluster restores properly; else, the bug will exist on the restored cluster.
It's also worth noting that other forms of this bug are apparent on older
cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus,
there is no need to fail a mixed version restore to protect the customer from
this already existing bug.

Informs cockroachdb#87305

Release justification: bug fix

Release note (bug fix): fix for TA advisory
https://cockroachlabs.atlassian.net/browse/TSE-198
@msbutler
Copy link
Collaborator Author

msbutler commented Oct 13, 2022

closed via

22.1 PR
#88488
21.2 PR
#89019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

1 participant