-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RESTORE: Restores may split ranges between column families of same row #109483
Labels
A-disaster-recovery
branch-master
Failures and bugs on the master branch.
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-technical-advisory
Caused a technical advisory
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
S-0-corruption-or-data-loss
Unrecoverable corruption, data loss, or other catastrophic issues that can’t be fixed by upgrading.
T-disaster-recovery
Comments
dt
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
branch-master
Failures and bugs on the master branch.
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
branch-release-23.1.9-rc
labels
Aug 25, 2023
cc @cockroachdb/disaster-recovery |
dt
added
S-0-corruption-or-data-loss
Unrecoverable corruption, data loss, or other catastrophic issues that can’t be fixed by upgrading.
T-disaster-recovery
and removed
T-disaster-recovery
labels
Aug 25, 2023
cc @cockroachdb/disaster-recovery |
rhu713
pushed a commit
to rhu713/cockroach
that referenced
this issue
Aug 30, 2023
Remove all calls to key.Next() from generateAndSendImportSpans so that all cover spans have valid keys as start and end keys. Fixes: cockroachdb#109483 Release note: None
rhu713
pushed a commit
to rhu713/cockroach
that referenced
this issue
Aug 30, 2023
Remove all calls to key.Next() from generateAndSendImportSpans so that all cover spans have valid keys as start and end keys. This fixes an issue where a split can be called on an invalid key that's in the form of `someValidKey.Next()` during restore. These invalid keys will generally have a `NULL` at the end of the key, which will result in an error when calling `EnsureSafeSplits` on this split key. Currently errors from `EnsureSafeSplits` are ignored, and thus a split will always be attempted on this type of invalid split key. This split key can land in the middle of a row with column families, and thus result in failing SQL queries when querying the restored table. Fixes: cockroachdb#109483 Release note: None
lidorcarmel
added a commit
to lidorcarmel/cockroach
that referenced
this issue
Aug 30, 2023
Restore may use unsafe keys as split points, which may cause unsafe splits between column families, which may cause SQL to fail when reading the row, or worse, return wrong resutls. This commit avoids splitting on keys that might be unsafe. See the issue for more info. Epic: none Informs: cockroachdb#109483 Release note: None.
rhu713
pushed a commit
to rhu713/cockroach
that referenced
this issue
Aug 30, 2023
Remove all calls to key.Next() from generateAndSendImportSpans so that all cover spans have valid keys as start and end keys. This fixes an issue where a split can be called on an invalid key that's in the form of `someValidKey.Next()` during restore. These invalid keys will generally have a `NULL` at the end of the key, which will result in an error when calling `EnsureSafeSplits` on this split key. Currently errors from `EnsureSafeSplits` are ignored, and thus a split will always be attempted on this type of invalid split key. This split key can land in the middle of a row with column families, and thus result in failing SQL queries when querying the restored table. A slight modification was also made to restore span generation to only create spans out of the start keys of the file instead of both start and end keys. This is not expected to change much to the generated spans as most of the time the end key of a file span is the same as the start key of the subsequent file, but it simplifies the restore span generation logic and may result in slighly larger spans during restore. Fixes: cockroachdb#109483 Release note: None
craig bot
pushed a commit
that referenced
this issue
Oct 13, 2023
112321: roachtest: maybe init tpcc with column families in backup-restore tests r=adityamaru a=msbutler This patch inits the tpcc workload with column families 50% of the time in the backup-mixed-version and backup-restore/roundtrip tests. This patch is the first step to recreating a roachtest workload that can reproduce #109483. Release note: None Epic: none Co-authored-by: Michael Butler <[email protected]>
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 13, 2023
This patch metamorphically reduces the default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the patch metamorphically reduces the backup.restore_span.target_size setting to increase test coverage of generateImportSpans. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 16, 2023
This patch inits the tpcc workload with column families 50% of the time in the backup-mixed-version and backup-restore/roundtrip tests. This patch is the first step to recreating a roachtest workload that can reproduce #109483. Release note: None Epic: none
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 16, 2023
This patch inits the tpcc workload with column families 50% of the time in the backup-mixed-version and backup-restore/roundtrip tests. This patch is the first step to recreating a roachtest workload that can reproduce #109483. Release note: None Epic: none
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 17, 2023
This patch metamorphically reduces the default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the patch metamorphically reduces the backup.restore_span.target_size setting to increase test coverage of generateImportSpans. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 19, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 19, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 19, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 24, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Oct 25, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
craig bot
pushed a commit
that referenced
this issue
Oct 26, 2023
112356: roachtest: add backup-restore/small-ranges r=renatolabs a=msbutler This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to #109483 but at smaller data sizes. Informs #109483 Release note: None Co-authored-by: Michael Butler <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 27, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to #109483 but at smaller data sizes. Informs #109483 Release note: None
msbutler
added a commit
to msbutler/cockroach
that referenced
this issue
Nov 3, 2023
This patch adds a new backup-restore roachtest variant that reduces default range size for user databases in the backup-restore/round-trip roachtest to simulate a larger cluster, in terms of range count, at smaller data sizes. In addition, the roachtest scales down a few cluster settings such that the ratio of rangeSize/clusterSetting remains constant. This patch should allow us to recreate a roachtest workload that can simulate the conditions that lead to cockroachdb#109483 but at smaller data sizes. Informs cockroachdb#109483 Release note: None
rytaft
added
C-technical-advisory
Caused a technical advisory
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
labels
Nov 30, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
branch-master
Failures and bugs on the master branch.
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-technical-advisory
Caused a technical advisory
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
S-0-corruption-or-data-loss
Unrecoverable corruption, data loss, or other catastrophic issues that can’t be fixed by upgrading.
T-disaster-recovery
This is coming from https://github.com/cockroachlabs/support/issues/2549#issuecomment-1692104972
If a DB was restored, and the schema of that DB has a table with multiple column families, SQL may fail reading or return wrong results.
Restore calls
EnsureSafeSplitKey()
before using a key as a split point, and then passes it toAdminSplit()
. If the key was a valid key thenEnsureSafeSplitKey()
modifies the key to not be between column families, meaning it ensures the key would be a safe split point.If the key is not a table key then
EnsureSafeSplitKey()
fails, but Restore can still split on that key - it is safe to split on non-table keys. But, if the key is a table key thatEnsureSafeSplitKey()
cannot parse then it will fail, and Restore (wrongfully) assumes that the key is again, not a table key.In the bug here, the key was produced by
Key.Next()
which adds a 0 (NULL) to the end of the key, and that key cannot be parsed properly byEnsureSafeSplitKey()
. Restore still used that key as a split point even though it may be between column families.If the schema has column families, and an unsafe split point falls between column families then, when reading the table, SQL may either fail because it cannot find the column that is in another range (a different column family), or worse, SQL may succeed but return NULL if those missing columns are nullable (wrong result).
Worth mentioning that the data is not lost, and therefore by changing the range splits the DB can be fixed.
There are a things to do here:
Key.Next()
for restore spans (or at least avoid allowing those to be used as split point).Jira issue: CRDB-30944
The text was updated successfully, but these errors were encountered: