Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: change some cluster settings during mixed-version backups #103228

Merged

Conversation

renatolabs
Copy link
Contributor

@renatolabs renatolabs commented May 12, 2023

This commit makes some final (for now) changes to the
backup-restore/mixed-version roachtest. Specifically:

  • we set some backup/restore related cluster settings. These are
    publicly documented settings and should help expose corner cases that
    might be harder to come up naturally using the default settings. This
    is an area that is known to need more tests, as described in a recent
    postmortem [1].

  • introduce a background function that executes statements that lead
    to rows being inserted into system tables, particularly those that are
    generally empty in most tests.

  • simplify the workload setup in the test: the bank workload is
    responsible for testing edge cases, while tpcc is a workload that
    should better represent customer workloads.

  • verify that backups taken in mixed-version can be restored both in
    the previous version and in the next version. Previously, we were only
    testing the next version.

Note that most of these changes are not specificaly related to the
mixed-version context this test is in. In the future, these features
should be packaged in a format that is easier to consume by other
tests.

[1] https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/3013804060/Postmortem+101963+revision+history+backups

Epic: none

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@renatolabs renatolabs force-pushed the rc/7-randomized-system-settings branch 3 times, most recently from aebef6e to 8dad362 Compare May 16, 2023 14:26
@renatolabs renatolabs force-pushed the rc/7-randomized-system-settings branch 7 times, most recently from a628f72 to ab002a9 Compare May 17, 2023 19:43
@renatolabs renatolabs marked this pull request as ready for review May 17, 2023 19:45
@renatolabs renatolabs requested a review from a team as a code owner May 17, 2023 19:45
@renatolabs renatolabs requested review from herkolategan and smg260 and removed request for a team May 17, 2023 19:45
@renatolabs
Copy link
Contributor Author

renatolabs commented May 17, 2023

This is ready for review. There are currently 3 known flakes (that are likely real issues); these have been documented elsewhere: see #103239, #103481, and #103597.

@smg260
Copy link
Contributor

smg260 commented May 17, 2023

pkg/cmd/roachtest/tests/mixed_version_backup.go line 67 at r1 (raw file):

	finalizingSuffix = "_finalizing"

	// probabilities that the test will attemt to pause a backup job.

typo

@smg260
Copy link
Contributor

smg260 commented May 17, 2023

pkg/cmd/roachtest/tests/mixed_version_backup.go line 71 at r1 (raw file):

	alwaysPause = 1

	// the test will not pause a backup job more than `maxPause` times.

typo: maxPauses

@smg260
Copy link
Contributor

smg260 commented May 17, 2023

pkg/cmd/roachtest/tests/mixed_version_backup.go line 1561 at r1 (raw file):

			}

			l.Printf("pausing job %d", jobID)

nit: since you have the duration, it wouldn't hurt including it in the lo

@smg260
Copy link
Contributor

smg260 commented May 17, 2023

pkg/cmd/roachtest/tests/mixed_version_backup.go line 1561 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

nit: since you have the duration, it wouldn't hurt including it in the lo

*log

@smg260
Copy link
Contributor

smg260 commented May 17, 2023

pkg/cmd/roachtest/tests/mixed_version_backup.go line 1770 at r1 (raw file):

		{
			{Plan: onPrevious, Execute: onNext, PauseProbability: defaultPauseProbability}, // full
			{Plan: onPrevious, Execute: onNext, PauseProbability: defaultPauseProbability}, // incremental

just clarifying: this is incremental, because it depends on the state arrived at after the previous spec?

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 283 at r1 (raw file):

//
// Do NOT use the rng returned by this function in mixedversion hooks
// (functions passed to `InMixeddVersion` and similar). Instead, use

typo: InMixedVersion


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1928 at r1 (raw file):

// resetCluster wipes the entire cluster and starts it again with the
// current latest binary. This is done before we attempt restoring a

should this say starts it again with the specified version binary?


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1938 at r1 (raw file):

	}

	cockroachPath := clusterupgrade.BinaryPathFromVersion(version)

nit: I think this func reads better as BinaryPathForVersion, not sure


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1990 at r1 (raw file):

			if version != clusterupgrade.MainVersion && strings.Contains(collection.name, finalizingSuffix) {
				// Do not attempt to restore, in the previous version, a
				// backup that was taken while the cluster was finaizing, as

typo: finalizing

This commit makes some final (for now) changes to the
`backup-restore/mixed-version` roachtest. Specifically:

* we set some backup/restore related cluster settings. These are
publicly documented settings and should help expose corner cases that
might be harder to come up naturally using the default settings. This
is an area that is known to need more tests, as described in a recent
postmortem [1].

* introduce a background function that executes statements that lead
to rows being inserted into system tables, particularly those that are
generally empty in most tests.

* simplify the workload setup in the test: the `bank` workload is
responsible for testing edge cases, while `tpcc` is a workload that
should better represent customer workloads.

* verify that backups taken in mixed-version can be restored both in
the previous version and in the next version. Previously, we were only
testing the next version.

Note that most of these changes are not specificaly related to the
mixed-version context this test is in. In the future, these features
should be packaged in a format that is easier to consume by other
tests.

[1] https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/3013804060/Postmortem+101963+revision+history+backups

Epic: none

Release note: None
@renatolabs renatolabs force-pushed the rc/7-randomized-system-settings branch from ab002a9 to 19383d7 Compare May 19, 2023 15:04
Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @smg260)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 283 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

typo: InMixedVersion

Fixed.


pkg/cmd/roachtest/tests/mixed_version_backup.go line 67 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

typo

Fixed.


pkg/cmd/roachtest/tests/mixed_version_backup.go line 71 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

typo: maxPauses

Fixed.


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1561 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

*log

Added.


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1770 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

just clarifying: this is incremental, because it depends on the state arrived at after the previous spec?

These are the specs for a full backup (first element in the array) and an incremental backup taken on top of the full (second element in the array).


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1928 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

should this say starts it again with the specified version binary?

Good catch, fixed.


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1938 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

nit: I think this func reads better as BinaryPathForVersion, not sure

I agree, but I'll make this change in a commit that refactors the mixedversion package instead of here.


pkg/cmd/roachtest/tests/mixed_version_backup.go line 1990 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

typo: finalizing

Fixed.

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan)

@renatolabs
Copy link
Contributor Author

TFTR!

bors r=smg260

@craig
Copy link
Contributor

craig bot commented May 24, 2023

Build succeeded:

@dt
Copy link
Member

dt commented May 29, 2023

This test seems to not play nicely with --versions-binary-override. It seems like it calls clusterupgrade.StartWithBinary with the output of directly calling BinaryPathForVersion, even though the path uploaded to by clusterupgrade.UploadVersion will only use BinaryPathForVersion if there is no override. Do we need to plumb test.Test down around move of mixedversion or something?

It also seems like if nodes fail to start, this roachtest crashes when it tries to call refreshBinaryVersions on error, since that assumes that tr.connCache is setup but it might not be, leading to out-of-bounds.

@renatolabs
Copy link
Contributor Author

This test seems to not play nicely with --versions-binary-override

Good point, I'll make a note to address this in the mixedversion framework.

It also seems like if nodes fail to start, this roachtest crashes when it tries to call refreshBinaryVersions on error

This should have been fixed by #103799.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants