-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: teach makeSimpleImportSpans to exclude backup data on spans that were later reintroduced #87305
Comments
cc @cockroachdb/bulk-io |
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each index we want to restore - identify the last time, l, the index was re-introduced, using the manifests - dont restore the index using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple sql indices. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. Release justification: bug fix Release note: none
Is this a release blocker on 22.1? That seems like a mistake, should it be 22.2 instead? |
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each index we want to restore - identify the last time, l, the index was re-introduced, using the manifests - dont restore the index using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple sql indices. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. Fixes cockroachdb#87305 Release justification: bug fix Release note: none
Well, it turns out this bug is in 22.1, and most likely in earlier releases, but only for cluster backup chains, as they have been backing up offline tables for as long they have been around, I think . I confirmed that this bug is on 22.1 by executing the sql commands in my data driven test on a 22.1 roachprod cluster. See my sql shell log with the commands I ran. The bug manifests in the following ways:
I have not confirmed this bug is on 21.2 because it's harder to reproduce without the |
verified this bug is on 21.2.15 using the following cmds on a roachprod cluster. Note that descriptor resolution for cluster backups do not filter offline tables (just dropped ones), likely for job resolution on the restored cluster.
|
on 22.1, I found a slightly different bug which i'll file an issue for once i get to the root cause. On #87873, I tried to fix the cluster backup bug described above on 22.1 by excluding offline tables. For backups without revision history this works, but for backups with revision history (cluster/database backups), I found another surprise: the first incremental backup which observes the table back online will reintroduce the clear ranged data and restore will restore the clear ranged data :(. This behavior isn't apparent on master. I have yet to find a root cause, but I suspect the bug is in kv server. |
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each index we want to restore - identify the last time, l, the index was re-introduced, using the manifests - dont restore the index using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple sql indices. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. Fixes cockroachdb#87305 Release justification: bug fix Release note: none
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each table we want to restore - identify the last time, l, the table was re-introduced, using the manifests - dont restore the table using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Fixes cockroachdb#87305 Release justification: bug fix Release note: none
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note: fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See cockroachdb#87305 for a detailed explanation. This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data. Here's the basic impliemenation in Restore: - For each span we want to restore - identify the last time, l, the span was introduced, using the manifests - dont restore the span using a backup if backup.EndTime < l This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup. It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (cockroachdb#88042, cockroachdb#88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug. Informs cockroachdb#87305 Release justification: bug fix Release note (bug fix): fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Currently RESTORE cannot properly handle restoring a table that had a non-mvcc rollback, without adding some logic to makeSimpleImportSpans.
Consider the following timeline:
This occurs because 1)
makeSimpleImportSpans
naively includes the backup file with the imported data as part of foo's restoreSpanEntry and 2) the backups have MVCC delete history related to the rollback. This was never a problem in previous releases because offline importing data were not backed up until the import succeeded.To address this,
makeSimpleImportSpan
needs to exclude backup files for a span that was later reintroduced due to a non-mvcc bulk op.Jira issue: CRDB-19262
The text was updated successfully, but these errors were encountered: