Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: add COCKROACH_UPGRADE_TO_DEV_VERSION to DefaultEnvVars #88005

Conversation

srosenberg
Copy link
Member

@srosenberg srosenberg commented Sep 15, 2022

Recent changes to cockroach_versions (logic) now require setting COCKROACH_UPGRADE_TO_DEV_VERSION environment variable in order to allow upgrading a stable release data-dir to a dev. version. The PR [1] which introduced this env. var. did not correctly backport the change to all the roachtests which perform this type of upgrade.

Since all roachtests which exercise upgrade paths are intended to test this upgrade scenario, we set COCKROACH_UPGRADE_TO_DEV_VERSION by default in roachprod instead of polluting the roachtests with more config. settings.

Consequently, MakeClusterSettings now returns the default ClusterSettings which includes COCKROACH_UPGRADE_TO_DEV_VERSION; generateStartCmd ensures it's passed into cockroach env. via start.sh.

Release note: None

Release justification: bug fix in roachtests.

Resolves:
#87675
#87687

[1] #87468

@srosenberg srosenberg requested a review from a team as a code owner September 15, 2022 17:12
@srosenberg srosenberg requested review from herkolategan and removed request for a team September 15, 2022 17:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@srosenberg srosenberg requested review from renatolabs and dt September 15, 2022 17:12
Recent changes to cockroach_versions (logic) now require setting
COCKROACH_UPGRADE_TO_DEV_VERSION environment variable in order to
allow upgrading a stable release data-dir to a dev. version.
The PR [1] which introduced this env. var. did not correctly backport
the change to all the roachtests which perform this type of upgrade.

Since all roachtests which exercise upgrade paths are intended to
test this upgrade scenario, we set COCKROACH_UPGRADE_TO_DEV_VERSION
by default in roachprod instead of polluting the roachtests with more
config. settings.

Consequently, MakeClusterSettings now returns the default
ClusterSettings which includes COCKROACH_UPGRADE_TO_DEV_VERSION;
generateStartCmd ensures it's passed into cockroach env. via start.sh.

Release note: None

Release justification: bug fix in roachtests.

Resolves:
  cockroachdb#87675
  cockroachdb#87687

[1] cockroachdb#87468
@srosenberg srosenberg force-pushed the sr/default_roachprod_to_COCKROACH_UPGRADE_TO_DEV_VERSION branch from ac9a7c5 to ac5ee91 Compare September 15, 2022 17:14
@dt
Copy link
Member

dt commented Sep 15, 2022

in #87826, @ajstorm has been arguing the opposite direction: we should be aiming to be more narrow in where we pass the env var, to more closely mirror how we expect a user to use it (i.e. only when you know you're doing a permanent stable->dev upgrade).

I'm mostly ambivalent; roachprod should only be used in dev and test cases anyway so I don't feel like it is particularly risky to default it as we're doing here, but it does mean we spend most of the time interacting with it in a way our users won't. 🤷

@srosenberg
Copy link
Member Author

I'm mostly ambivalent; roachprod should only be used in dev and test cases anyway so I don't feel like it is particularly risky to default it as we're doing here, but it does mean we spend most of the time interacting with it in a way our users won't. 🤷

This new (upgrade safety valve) feature is rather controversial, imo. E.g., what are concrete test scenarios, other than upgrade, where you would explicitly need to unset COCKROACH_UPGRADE_TO_DEV_VERSION? If you just want to disable auto upgrade unconditionally, cluster.preserve_downgrade_option does that. The only roachtests which use the latter are those testing the upgrade paths.

As to Adam's comment, the intended user in this case is almost always an engineer; plus, more often than not, they would be using a dev. build. Roachprod defaults can be overridden if need be; it is very much an internal tool on which roachtests depend.

@ajstorm
Copy link
Collaborator

ajstorm commented Sep 16, 2022

As to Adam's comment, the intended user in this case is almost always an engineer; plus, more often than not, they would be using a dev. build. Roachprod defaults can be overridden if need be; it is very much an internal tool on which roachtests depend.

That all makes sense to me. What I wonder though, is how many tests we'll have where the cluster setting is not set? I'd imagine these tests could only run on the release branch, but that we'd want to have some that are automated. Do those tests exist today?

@dt
Copy link
Member

dt commented Sep 16, 2022

With this change, it'll just be set by default in all roachprod clusters, including those used by roachtest, all the time and individual tests won't need to set it or unset it at all.

@srosenberg
Copy link
Member Author

What I wonder though, is how many tests we'll have where the cluster setting is not set? I'd imagine these tests could only run on the release branch, but that we'd want to have some that are automated. Do those tests exist today?

"cluster setting" is ambiguous in the context of this discussion. If you mean the new environment variable, then there aren't any to my knowledge. If you mean cluster.preserve_downgrade_option, then the roachtests,

pkg/cmd/roachtest/tests/autoupgrade.go
pkg/cmd/roachtest/tests/multitenant_upgrade.go
pkg/cmd/roachtest/tests/version.go
pkg/cmd/roachtest/tests/versionupgrade.go

and the unit tests in pkg/server/version_cluster_test.go (re)set it.

With this change, it'll just be set by default in all roachprod clusters, including those used by roachtest, all the time and individual tests won't need to set it or unset it at all.

That's correct. I prefer to set this as a global default in roachprod as opposed to having to sprinkle it across different roachtests. Each new configuration setting multiples the already exponential state space by 2. It's also quite easy to forget it (in a new upgrade test) and waste time debugging why the upgrade failed.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan)

@srosenberg
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 20, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 20, 2022

Build succeeded:

@ajstorm
Copy link
Collaborator

ajstorm commented Sep 20, 2022

If you mean the new environment variable, then there aren't any to my knowledge.

This is exactly my concern. Can we have at least one test that runs on the release branches which validates that the environment variable doesn't need to be set for the migration to complete successfully. It would suck if we didn't have this test and we accidentally broke the behaviour, causing all customers to have to set the env var (at least, until we fixed the bug).

@renatolabs
Copy link
Contributor

renatolabs commented Sep 20, 2022

Alternatively, it would also be nice if there was an easy way to build master as a "non-dev build" in the tests (currently I believe a code change is necessary). In the context of these tests, we want to pretend that master is the new release.

@srosenberg
Copy link
Member Author

This is exactly my concern. Can we have at least one test that runs on the release branches which validates that the environment variable doesn't need to be set for the migration to complete successfully. It would suck if we didn't have this test and we accidentally broke the behaviour, causing all customers to have to set the env var (at least, until we fixed the bug).

This change is guarded by developmentBranch [1] which presumably would be automatically set to false on every subsequent release branch. (@celiala or @dt please confirm) In light of the guard, this env. var affects only the case of automated upgrades from a release to dev.

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/clusterversion/cockroach_versions.go#L481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants