Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2.17-rc: importer: only check import *atomicity* in TestImportWorkerFailure #114055

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 8, 2023

Backport 1/1 commits from #108626.

/cc @cockroachdb/release


Five years ago, in #26881, we changed import to retry on worker failures, which made imports much more resilient to transient failures like nodes going down. As part of this work we created TestImportWorkerFailure which shuts down one node during an import, and checks that the import succeeded. Unfortunately, this test was checked-in skipped, because though imports were much more resilient to node failures, they were not completely resilient in every possible scenario, making the test flakey.

Two months ago, in #105712, we unskipped this test and discovered that in some cases the import statement succeeded but only imported a partial dataset. This non-atomicity seems like a bigger issue than whether the import is able to succeed in every possible transient failure scenario, and is tracked separately in #108547.

This PR changes TestImportWorkerFailure to remove successful import as a necessary condition for test success. Instead, the test now only checks whether the import was atomic; that is, whether a successful import imported all data or a failed import imported none. This is more in line with what we can guarantee about imports today.

Fixes: #102839

Release note: None

Release justification: test-only change.

Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in cockroachdb#108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

Fixes: cockroachdb#102839

Release note: None
@yuzefovich yuzefovich requested a review from a team as a code owner November 8, 2023 17:52
Copy link

blathers-crl bot commented Nov 8, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Nov 8, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2
Copy link
Collaborator

michae2 commented Nov 15, 2023

Does this need to be part of the .17 release branch? Once we cut the .17 release we won't be testing the release branch any more.

@yuzefovich
Copy link
Member Author

We did see a failure on this branch #113844, so I don't see any downsides to merging this to prevent further pointless failures that might occur before the branch is deleted. It's a test-only change after all, so it bears no risk.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)

@yuzefovich yuzefovich merged commit 0ad77cf into cockroachdb:release-22.2.17-rc Nov 15, 2023
2 checks passed
@yuzefovich yuzefovich deleted the backport22.2.17-rc-108626 branch November 15, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants