Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow delayed downscale of subset of pods #156

Merged
merged 5 commits into from
Jun 17, 2024
Merged

Allow delayed downscale of subset of pods #156

merged 5 commits into from
Jun 17, 2024

Conversation

pstibrany
Copy link
Member

This PR implements earlier delayed downscale of pods that have already reached their delay, even if not ALL pods have reached it yet.

Fixes #155

@pstibrany pstibrany requested a review from pracucci June 17, 2024 09:42
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job Peter! LGTM. I left a couple of minor comments I would be glad if you could look at before merging. Thanks!

}

delay, prepareURL, err := parseDelayedDownscaleAnnotations(sts.GetAnnotations())
if delay == 0 || prepareURL == nil || err != nil {
return err
return desiredReplicas, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was an error would be more correct to return currentReplicas. Today the caller doesn't scale to updatedDesiredReplicas in case of error, but I would like to protect from future bugs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f7fcdb0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And follow-up fix in 7e79c3b.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And follow-up fix in 7e79c3b.

Exactly. That's what I had in mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately test caught my mistake :)

// No change in the number of replicas: don't log because this will be the result most of the time.
continue
}

// We're going to change number of replicas on the statefulset.
// If there is delayed downscale configured on the statefulset, we will first handle delay part, and only if that succeeds,
// continue with downscaling or upscaling.
if err := checkScalingDelay(ctx, c.logger, sts, client, currentReplicas, desiredReplicas); err != nil {
level.Warn(c.logger).Log("msg", "not scaling statefulset due to failed scaling delay check", "group", groupName, "name", sts.GetName(), "currentReplicas", currentReplicas, "desiredReplicas", desiredReplicas, "err", err)
var desiredReplicas int32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make code more robust from future changes, I suggest to not have this initialised to 0. We could simply do:

Suggested change
var desiredReplicas int32
desiredReplicas, err := checkScalingDelay(ctx, c.logger, sts, client, currentReplicas, referenceResourceDesiredReplicas)
if err != nil {
// ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done in f7fcdb0.

@pstibrany pstibrany merged commit 5dce3cc into main Jun 17, 2024
6 checks passed
@pstibrany pstibrany deleted the fix-issue-155 branch June 17, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rollout-operator: faster scale down of prepared ingesters
2 participants