-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: pause changefeed during upgrade in mixed-versions test #88961
roachtest: pause changefeed during upgrade in mixed-versions test #88961
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @renatolabs)
pkg/cmd/roachtest/tests/mixed_version_cdc.go
line 269 at r1 (raw file):
Something is missing in the above sentence. Is it,
The goal is to ensure that database checks by the validator do not happen while an upgrade step is executing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @renatolabs)
pkg/cmd/roachtest/tests/mixed_version_cdc.go
line 302 at r1 (raw file):
// this, tests frequently fail with fingerprint mismatch errors // during the test // TODO: remove this once #88948 is resolved
Nit: TODO(renatolabs)
saves someone running git blame
:)
b7ebe78
to
84b0f05
Compare
In cockroachdb#87154, we started gracefully shutting down nodes in mixed-version tests. As a consequence of that change, the `cdc/mixed-versions` roachtest started failing. After some investigation, it was discovered that the issue has nothing to do with upgrades but instead with the the different way nodes are stopped (if we don't use different binaries and just stop and restart the same binaries, running current `master`, the issue is still observed). In order to stop that test from failing, this commit implements a workaround solution of stopping the changefeed while an upgrade is in process, which makes the test pass reliably. This commit also removes the retry logic from the FingerprintValidator code as it adds a dimension of complexity that is not needed in the tests. The validator will now block if trying to perform a SQL update until the upgrade process is finished. The actual issue that leads to this failure is being investigated in cockroachdb#88948. Fixes cockroachdb#87251. Release note: None.
84b0f05
to
597884f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @srosenberg)
pkg/cmd/roachtest/tests/mixed_version_cdc.go
line 269 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Something is missing in the above sentence. Is it,
The goal is to ensure that database checks by the validator do not happen while an upgrade step is executing?
Yep, good catch, fixed.
pkg/cmd/roachtest/tests/mixed_version_cdc.go
line 302 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Nit:
TODO(renatolabs)
saves someone runninggit blame
:)
Done
// crdbUpgradeStep is a wrapper to steps that upgrade the cockroach | ||
// binary running in the cluster. It makes sure we hold exclusive | ||
// access to the `crdbUpgrading` lock while the upgrade is in process | ||
func (cmvt *cdcMixedVersionTester) crdbUpgradeStep(step versionStep) versionStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call runCDCMixedVersion
somethin like runCDCMixedVersionPaused
to make it clear that we're not testing the running case. We'd want to be testing this paused case as well anyway. Could also be good to make this crdbUpgradeStep
explicitly somethin like crdbUpgradeStepPaused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that in a separate PR, properly refactoring this test to take parameters. For this one I'd like to just address #87251.
We'd want to be testing this paused case as well anyway
Is this what we recommend customers to do? Or something that you know customers do anyway when they are upgrading? Curious about what's the more common approach to deal with changefeeds during upgrades.
Closing this in favor of an upcoming fix that actually deals with the problem. |
In #87154, we started gracefully shutting down nodes in mixed-version tests. As a consequence of that change, the
cdc/mixed-versions
roachtest started failing. After some investigation, it was discovered that the issue has nothing to do with upgrades but instead with the the different way nodes are stopped (if we don't use different binaries and just stop and restart the same binaries, running currentmaster
, the issue is still observed).In order to stop that test from failing, this commit implements a workaround solution of stopping the changefeed while an upgrade is in process, which makes the test pass reliably. This commit also removes the retry logic from the FingerprintValidator code as it adds a dimension of complexity that is not needed in the tests. The validator will now block if trying to perform a SQL update until the upgrade process is finished.
The actual issue that leads to this failure is being investigated in #88948.
Fixes #87251.
Release note: None.