Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: pause changefeed during upgrade in mixed-versions test #88961

Conversation

renatolabs
Copy link
Contributor

In #87154, we started gracefully shutting down nodes in mixed-version tests. As a consequence of that change, the cdc/mixed-versions roachtest started failing. After some investigation, it was discovered that the issue has nothing to do with upgrades but instead with the the different way nodes are stopped (if we don't use different binaries and just stop and restart the same binaries, running current master, the issue is still observed).

In order to stop that test from failing, this commit implements a workaround solution of stopping the changefeed while an upgrade is in process, which makes the test pass reliably. This commit also removes the retry logic from the FingerprintValidator code as it adds a dimension of complexity that is not needed in the tests. The validator will now block if trying to perform a SQL update until the upgrade process is finished.

The actual issue that leads to this failure is being investigated in #88948.

Fixes #87251.

Release note: None.

@renatolabs renatolabs requested review from a team and srosenberg and removed request for a team September 28, 2022 21:12
@renatolabs renatolabs requested a review from a team as a code owner September 28, 2022 21:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_cdc.go line 269 at r1 (raw file):
Something is missing in the above sentence. Is it,

The goal is to ensure that database checks by the validator do not happen while an upgrade step is executing?

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_cdc.go line 302 at r1 (raw file):

		// this, tests frequently fail with fingerprint mismatch errors
		// during the test
		// TODO: remove this once #88948 is resolved

Nit: TODO(renatolabs) saves someone running git blame :)

In cockroachdb#87154, we started gracefully shutting down nodes in mixed-version
tests. As a consequence of that change, the `cdc/mixed-versions`
roachtest started failing. After some investigation, it was discovered
that the issue has nothing to do with upgrades but instead with the
the different way nodes are stopped (if we don't use different
binaries and just stop and restart the same binaries, running current
`master`, the issue is still observed).

In order to stop that test from failing, this commit implements a
workaround solution of stopping the changefeed while an upgrade is in
process, which makes the test pass reliably. This commit also removes
the retry logic from the FingerprintValidator code as it adds a
dimension of complexity that is not needed in the tests. The validator
will now block if trying to perform a SQL update until the upgrade
process is finished.

The actual issue that leads to this failure is being investigated in cockroachdb#88948.

Fixes cockroachdb#87251.

Release note: None.
@renatolabs renatolabs force-pushed the cdc-mixed-versions-workaround branch from 84b0f05 to 597884f Compare September 29, 2022 13:47
Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @srosenberg)


pkg/cmd/roachtest/tests/mixed_version_cdc.go line 269 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Something is missing in the above sentence. Is it,

The goal is to ensure that database checks by the validator do not happen while an upgrade step is executing?

Yep, good catch, fixed.


pkg/cmd/roachtest/tests/mixed_version_cdc.go line 302 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Nit: TODO(renatolabs) saves someone running git blame :)

Done

// crdbUpgradeStep is a wrapper to steps that upgrade the cockroach
// binary running in the cluster. It makes sure we hold exclusive
// access to the `crdbUpgrading` lock while the upgrade is in process
func (cmvt *cdcMixedVersionTester) crdbUpgradeStep(step versionStep) versionStep {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call runCDCMixedVersion somethin like runCDCMixedVersionPaused to make it clear that we're not testing the running case. We'd want to be testing this paused case as well anyway. Could also be good to make this crdbUpgradeStep explicitly somethin like crdbUpgradeStepPaused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that in a separate PR, properly refactoring this test to take parameters. For this one I'd like to just address #87251.

We'd want to be testing this paused case as well anyway

Is this what we recommend customers to do? Or something that you know customers do anyway when they are upgrading? Curious about what's the more common approach to deal with changefeeds during upgrades.

@renatolabs
Copy link
Contributor Author

Closing this in favor of an upcoming fix that actually deals with the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: cdc/mixed-versions failed
4 participants