Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: duplicate Files in the BackupManifest result in a pebble SSTWriter error #97953

Closed
adityamaru opened this issue Mar 2, 2023 · 1 comment · Fixed by #98041
Closed
Assignees
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Mar 2, 2023

In #97210 we moved Files, Descriptors and DescriptorChanges out of the backup manifest and into external SSTs written along with the manifest. If there are duplicate Files in the BackupManifest then the backup job fails with:

failed to write backup metadata Files SST: pebble: keys must be added in strictly increasing order: "\x12\xf6\xac\x89\xf7\x039\x88\x00\x01\x12data/844510561971896324.sst\x00\x01"/0,0#0,SET, "\x12\xf6\xac\x89\xf7\x039\x88\x00\x01\x12data/844510561971896324.sst\x00\x01"/0,0#0,SET

More details will be added to this issue as the investigation continues. Note, this feature is feature flagged and default off in release-22.2 so we do not expect it to affect existing customer backups.

Jira issue: CRDB-24983

Epic CRDB-21944

@adityamaru adityamaru added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 2, 2023
@adityamaru adityamaru added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Mar 2, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 2, 2023

cc @cockroachdb/disaster-recovery

@craig craig bot closed this as completed in a95ffcd Mar 7, 2023
@craig craig bot closed this as completed in #98041 Mar 7, 2023
renatolabs added a commit to renatolabs/cockroach that referenced this issue Mar 9, 2023
This updates the `backup/mixed-version` roachtest to use the recently
introduced mixed-version roachtest framework (`mixedversion` package).

The main behavior exercised remains the same: backups are taken in
mixed-binary state, and those backups are restored and verified at the
end of the test. However, this commit also improves the coverage of
mixed-version backup testing in a few ways:

* **Randomization**. By virtue of using the new framework, most runs
will be different from one another since the order of actions taken by
the test will be different. Previously, backups would always be taken
with 2 nodes in the old version and 2 nodes in the new version. Now,
backups can be taken when an arbitrary number of nodes is running the
new version. As a consequence, it's also possible that some executions
will attempt backups when all nodes are running a new binary version,
but the cluster version itself has not been updated. Other points of
new randomization include the choice of the node's external dir where
backups are stored, which node to connect to when running certain
statements, and how much to wait between backups.

* **Backup Options**. Backups will randomly be created with
`revision_history` enabled, or with an `encryption_passphrase`.

* **Downgrades**. The cluster is also downgraded in mixed-version
tests. No downgrades happened in that test before this commit.

* **Workload**. Instead of using fixed call to `generate_series` to
generate data between backups, the test now runs the `bank` workload
continuously during the test. A random wait between backups allows the
workload to make changes to the underlying table during the test and
for the backups to be taken while writes are taking place.

* **Finalization**: the test _may_ attempt to create a backup as the
upgrade is finalizing (i.e., migrations are running and cluster
version is advancing).

In addition, this test will also see improved coverage as we make more
improvements to test plans generated by the `mixedversion` package.
These changes will create more backup scenarios in the future without
requiring any code changes to this test.

This test has already helped us uncover one backup bug (cockroachdb#97953).

Epic: CRDB-19321

Release note: None
renatolabs added a commit to renatolabs/cockroach that referenced this issue Mar 10, 2023
This updates the `backup/mixed-version` roachtest to use the recently
introduced mixed-version roachtest framework (`mixedversion` package).

The main behavior exercised remains the same: backups are taken in
mixed-binary state, and those backups are restored and verified at the
end of the test. However, this commit also improves the coverage of
mixed-version backup testing in a few ways:

* **Randomization**. By virtue of using the new framework, most runs
will be different from one another since the order of actions taken by
the test will be different. Previously, backups would always be taken
with 2 nodes in the old version and 2 nodes in the new version. Now,
backups can be taken when an arbitrary number of nodes is running the
new version. As a consequence, it's also possible that some executions
will attempt backups when all nodes are running a new binary version,
but the cluster version itself has not been updated. Other points of
new randomization include the choice of the node's external dir where
backups are stored, which node to connect to when running certain
statements, and how much to wait between backups.

* **Backup Options**. Backups will randomly be created with
`revision_history` enabled, or with an `encryption_passphrase`.

* **Downgrades**. The cluster is also downgraded in mixed-version
tests. No downgrades happened in that test before this commit.

* **Workload**. Instead of using fixed call to `generate_series` to
generate data between backups, the test now runs the `bank` workload
continuously during the test. A random wait between backups allows the
workload to make changes to the underlying table during the test and
for the backups to be taken while writes are taking place.

* **Finalization**: the test _may_ attempt to create a backup as the
upgrade is finalizing (i.e., migrations are running and cluster
version is advancing).

In addition, this test will also see improved coverage as we make more
improvements to test plans generated by the `mixedversion` package.
These changes will create more backup scenarios in the future without
requiring any code changes to this test.

This test has already helped us uncover one backup bug (cockroachdb#97953).

Epic: CRDB-19321

Release note: None
renatolabs added a commit to renatolabs/cockroach that referenced this issue Mar 14, 2023
This updates the `backup/mixed-version` roachtest to use the recently
introduced mixed-version roachtest framework (`mixedversion` package).

The main behavior exercised remains the same: backups are taken in
mixed-binary state, and those backups are restored and verified at the
end of the test. However, this commit also improves the coverage of
mixed-version backup testing in a few ways:

* **Randomization**. By virtue of using the new framework, most runs
will be different from one another since the order of actions taken by
the test will be different. Previously, backups would always be taken
with 2 nodes in the old version and 2 nodes in the new version. Now,
backups can be taken when an arbitrary number of nodes is running the
new version. As a consequence, it's also possible that some executions
will attempt backups when all nodes are running a new binary version,
but the cluster version itself has not been updated. Other points of
new randomization include the choice of the node's external dir where
backups are stored, which node to connect to when running certain
statements, and how much to wait between backups.

* **Backup Options**. Backups will randomly be created with
`revision_history` enabled, or with an `encryption_passphrase`.

* **Downgrades**. The cluster is also downgraded in mixed-version
tests. No downgrades happened in that test before this commit.

* **Workload**. Instead of using fixed call to `generate_series` to
generate data between backups, the test now runs the `bank` workload
continuously during the test. A random wait between backups allows the
workload to make changes to the underlying table during the test and
for the backups to be taken while writes are taking place.

* **Finalization**: the test _may_ attempt to create a backup as the
upgrade is finalizing (i.e., migrations are running and cluster
version is advancing).

In addition, this test will also see improved coverage as we make more
improvements to test plans generated by the `mixedversion` package.
These changes will create more backup scenarios in the future without
requiring any code changes to this test.

This test has already helped us uncover one backup bug (cockroachdb#97953).

Epic: CRDB-19321

Release note: None
craig bot pushed a commit that referenced this issue Mar 14, 2023
96991: roachtest: update mixed-version backup to use new framework r=srosenberg a=renatolabs


This updates the `backup/mixed-version` roachtest to use the recently
introduced mixed-version roachtest framework (`mixedversion` package).

The main behavior exercised remains the same: backups are taken in
mixed-binary state, and those backups are restored and verified at the
end of the test. However, this commit also improves the coverage of
mixed-version backup testing in a few ways:

* **Randomization**. By virtue of using the new framework, most runs
will be different from one another since the order of actions taken by
the test will be different. Previously, backups would always be taken
with 2 nodes in the old version and 2 nodes in the new version. Now,
backups can be taken when an arbitrary number of nodes is running the
new version. As a consequence, it's also possible that some executions
will attempt backups when all nodes are running a new binary version,
but the cluster version itself has not been updated. Other points of
new randomization include the choice of the node's external dir where
backups are stored, which node to connect to when running certain
statements, and how much to wait between backups.

* **Backup Options**. Backups will randomly be created with
`revision_history` enabled, or with an `encryption_passphrase`.

* **Downgrades**. The cluster is also downgraded in mixed-version
tests. No downgrades happened in that test before this commit.

* **Workload**. Instead of using fixed call to `generate_series` to
generate data between backups, the test now runs the `bank` workload
continuously during the test. A random wait between backups allows the
workload to make changes to the underlying table during the test and
for the backups to be taken while writes are taking place.

* **Finalization**: the test _may_ attempt to create a backup as the
upgrade is finalizing (i.e., migrations are running and cluster
version is advancing).

In addition, this test will also see improved coverage as we make more
improvements to test plans generated by the `mixedversion` package.
These changes will create more backup scenarios in the future without
requiring any code changes to this test.

This test has already helped us uncover one backup bug (#97953).

Epic: CRDB-19321

Release note: None

98398: statusccl: stop serving /_status/nodes to tenants r=matthewtodd a=matthewtodd

Fixes #98057.

This reverts the work of #93268, which is no longer necessary now that we are eagerly capturing region information at execution time in #95449.

Release note: None

Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants