Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: backupccl: fingerprint 15GB restore roachtests #100124

Merged
merged 1 commit into from
Mar 31, 2023

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Mar 30, 2023

Backport 1/1 commits from #99792 on behalf of @msbutler.

/cc @cockroachdb/release


Previously, restore roachtests had little ability to detect data corruption regressions across runs. This patch introduces this ability. Specifically, this commit allows the restore roachtest writer to easily run a stripped fingerprint after a restore, and assert a match to the hardcoded fingerprint in the test spec.

For now, the fingerprint check is only run on the restore roachtests that restore 15GB of data. The check takes about the same amount of time it takes to run the restore (around 3 minutes), so before we use it on larger tests, we ought to consider adding performance improvements to the fingerprinting tool. These tests include:

  • restore/nodeShutdown/coordinator
  • restore/pause/tpce/15GB/aws/nodes=4/cpus=8 (used to restore 80GB)
  • restore/tpce/15GB/aws/nodes=4/cpus=8 (new test)
  • restore/nodeShutdown/worker (used to restore 80GB)
  • restore/nodeShutdown/coordinator (used to restore 80GB)

This patch also changes the node shutdown tests and the paused restore test to run the smaller 15GB tpce fixture, as it speeds the test run up.

Informs #98779

Release note: none


Release justification: test infra change

Previously, restore roachtests had little ability to detect data corruption
regressions across runs. This patch introduces this ability. Specifically,
this commit allows the restore roachtest writer to easily run a stripped
fingerprint after a restore, and assert a match to the hardcoded fingerprint
in the test spec.

For now, the fingerprint check is only run on the restore roachtests that
restore 15GB of data. The check takes about the same amount of time it takes to
run the restore (around 3 minutes), so before we use it on larger tests, we
ought to consider adding performance improvements to the fingerprinting tool.
These tests include:
- restore/nodeShutdown/coordinator
- restore/pause/tpce/15GB/aws/nodes=4/cpus=8 (used to restore 80GB)
- restore/tpce/15GB/aws/nodes=4/cpus=8 (new test)
- restore/nodeShutdown/worker (used to restore 80GB)
- restore/nodeShutdown/coordinator (used to restore 80GB)

This patch also changes the node shutdown tests and the paused restore test to
run the smaller 15GB tpce fixture, as it speeds the test run up.

Informs #98779

Release note: none
@blathers-crl blathers-crl bot requested a review from a team as a code owner March 30, 2023 14:25
@blathers-crl blathers-crl bot requested review from herkolategan and srosenberg and removed request for a team March 30, 2023 14:25
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-99792 branch from 79eddfe to e67bd5e Compare March 30, 2023 14:25
@blathers-crl
Copy link
Author

blathers-crl bot commented Mar 30, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot requested review from rhu713 and smg260 March 30, 2023 14:25
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-99792 branch from 2d67ad9 to 0a07809 Compare March 30, 2023 14:25
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Mar 30, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lidorcarmel, @msbutler, @renatolabs, @rhu713, @smg260, and @srosenberg)

@msbutler msbutler merged commit 5f85986 into release-23.1 Mar 31, 2023
@msbutler msbutler deleted the blathers/backport-release-23.1-99792 branch March 31, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants