-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: cluster backups include offline tables #88043
Labels
A-disaster-recovery
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
T-disaster-recovery
Comments
msbutler
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-disaster-recovery
T-disaster-recovery
labels
Sep 16, 2022
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 19, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each table we want to restore - identify the last time, l, the table was re-introduced, using the manifests - dont restore the table using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Fixes cockroachdb#87305 Release justification: bug fix Release note: none
Hi @msbutler, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
msbutler
added
the
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
label
Sep 20, 2022
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 21, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note: fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 21, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
craig bot
pushed a commit
that referenced
this issue
Sep 22, 2022
87312: backupccl: elide spans from backups that were subsequently reintroduced r=dt,adityamaru a=msbutler Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See #87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (#88042, #88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs #87305 Release justification: bug fix Release note: fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198 88384: server: return elapsed time for active executions r=xinhaoz a=xinhaoz Previously, we calculated the time elapsed for an active stmt or txn based on the start time returned from the server and the time the response was last received. Calculating this value on the client is not reliable and can lead to negative values when the server time is slightly ahead. This commit fixes this issue by including the time elapsed as part of the active txns and stmts response. Release note (bug fix): time elapsed for active txns and stmts is never negative. 88449: kvserver: fix flaky test for consistency checks r=erikgrinaker a=pavelkalinnikov There was a race in selecting between a canceled context.Done and 0-time timer. Fixes #88133 Release justification: flaky test fix Release note: None Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 22, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See #87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (#88042, #88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs #87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 23, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 29, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 29, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Sep 30, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 4, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 4, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 12, 2022
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
T-disaster-recovery
In pre 22.2 releases, it was assumed that cluster backups excluded offline tables, but in fact they do include them. This misconception can lead to corrupt data in restore. Consider:
t0: begin IMPORT on foo
t1: conduct cluster backup - captures foo's pre-import state and some importing data
t2: rollback import foo via non-mvcc clear range
t3: conduct incremental backup
t4: restore foo to latest time
For cluster backups without revision history, this bug could be fixed by simply excluding the table from the backup. For cluster backups with revision history, a more complex fix is necessary, as outlined in #88042
Jira issue: CRDB-19658
The text was updated successfully, but these errors were encountered: