Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrademanager: defer retrieval of cluster ID #113382

Merged
merged 2 commits into from
Nov 7, 2023

Conversation

stevendanna
Copy link
Collaborator

Previously, the upgrade manager's constructor took the cluster ID
directly.

However, the cluster ID may not be initialised at the point of
constructing the manager. This isn't typically a problem because in
most cases, the only upgrade that require the cluster ID doesn't
actually take the cluster ID from the manager. Rather, it gets the
cluster ID from the execution context of the migration job on
resumption.

But, tests such as TestRetriesWithExponentialBackoff set the
DontUseJobs testing hook. In this case, the migration manager runs the
upgrades directly, passing its own copy of the cluster ID, which is
unitialized.

To fix this, we pass the cluster ID container rather than the cluster
ID so that we can defer retrieving it until we actually run the
migrations, at which point it should be set.

Informs #112763

Release note: none

@stevendanna stevendanna requested review from a team as code owners October 31, 2023 09:09
@stevendanna stevendanna requested review from dt and removed request for a team October 31, 2023 09:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna added the backport-23.2.x Flags PRs that need to be backported to 23.2. label Nov 1, 2023
@stevendanna
Copy link
Collaborator Author

bors r=adityamaru

@craig
Copy link
Contributor

craig bot commented Nov 1, 2023

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@craig
Copy link
Contributor

craig bot commented Nov 1, 2023

GitHub status checks took too long to complete, so bors is giving up. You can adjust bors configuration to have it wait longer if you like.

This changes more of the helper functions in the test to take a
testing.T so that failures inside the function are correctly called on
the subtests rather than the parent test.

Epic: none

Release note: None
Previously, the upgrade manager's constructor took the cluster ID
directly.

However, the cluster ID may not be initialised at the point of
constructing the manager. This isn't typically a problem because in
most cases, the only upgrade that _require_ the cluster ID doesn't
actually take the cluster ID from the manager. Rather, it gets the
cluster ID from the execution context of the migration job on
resumption.

But, tests such as TestRetriesWithExponentialBackoff set the
DontUseJobs testing hook. In this case, the migration manager runs the
upgrades directly, passing its own copy of the cluster ID, which is
unitialized.

To fix this, we pass the cluster ID container rather than the cluster
ID so that we can defer retrieving it until we actually run the
migrations, at which point it should be set.

Informs cockroachdb#112763

Release note: none
@stevendanna
Copy link
Collaborator Author

bors r=adityamaru

@craig
Copy link
Contributor

craig bot commented Nov 7, 2023

Build succeeded:

@craig craig bot merged commit 1f402df into cockroachdb:master Nov 7, 2023
3 checks passed
craig bot pushed a commit that referenced this pull request Dec 6, 2023
113494: jobs: deflake TestRetriesWithExponentialBackoff r=adityamaru a=stevendanna

This attempts to deflake TestRetriesWithExponentialBackoff.

The majority of the problems with this test are the result of pause
clearing the backoff state and thus needing to take a different path
through most of this test.

If this PR doesn't solve the flakes, I recommend that we just remove those
two test cases and write a new test for them.  

The only reason I haven't done that here is that this test has revealed interesting
job system behaviour in the past since it is so involved.

See individual commits for more details.

Fixes #112763

First two commits are form #113382

Co-authored-by: Steven Danna <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants