Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: improve basic canary approximation accuracy and honor maxSurge #1759

Merged
merged 1 commit into from
Jan 14, 2022

Conversation

jessesuen
Copy link
Member

@jessesuen jessesuen commented Jan 12, 2022

This change fixes the issue where the basic canary calculation would possibly round up the canary/stable counts and could exceed maxSurge: 0. The issue made it not possible to use rollouts for use cases where the replicas needed to be capped to a fixed size, such as a kafka consumer.

As part of the change, we will now choose between stable/canary replica counts summing up to spec.replicas or spec.replicas + 1 depending on:

  • which can give us a better approximation of the desired canary weight
  • which will not violate maxSurge

Prior to this fix, we would end up always rounding up to spec.replicas + 1 unless the desired weight was evenly divisible to 100.

For a deeper understanding of the improvements, see the TestApproximateWeightedNewStableReplicaCounts table test for examples of the resulting calculations.

Resolves #1707

Here are some examples of differences between v1.1 vs. v1.2 behavior:

replicas maxSurge desired weight v1.1 canary/stable count v1.2 canary/stable count note
1 0 10% 1/1 0/1 no longer violates maxSurge
1 0 50% 1/1 1/0 no longer violates maxSurge
3 1 33% 1/3 1/2 better approximation of desired weight
3 0 50% 2/2 2/1 no longer violates maxSurge

Signed-off-by: Jesse Suen [email protected]

@jessesuen jessesuen force-pushed the canary-approximation branch from 30e45d9 to 6208ff4 Compare January 12, 2022 11:20
@codecov
Copy link

codecov bot commented Jan 12, 2022

Codecov Report

Merging #1759 (22b2e01) into master (e2201d0) will increase coverage by 0.02%.
The diff coverage is 92.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1759      +/-   ##
==========================================
+ Coverage   82.02%   82.05%   +0.02%     
==========================================
  Files         116      116              
  Lines       16096    16212     +116     
==========================================
+ Hits        13203    13303     +100     
- Misses       2218     2229      +11     
- Partials      675      680       +5     
Impacted Files Coverage Δ
utils/replicaset/canary.go 89.49% <92.45%> (+0.07%) ⬆️
rollout/experiment.go 82.69% <0.00%> (-1.29%) ⬇️
rollout/analysis.go 78.90% <0.00%> (-0.70%) ⬇️
utils/analysis/factory.go 94.02% <0.00%> (-0.42%) ⬇️
pkg/apis/rollouts/validation/validation.go 94.06% <0.00%> (ø)
.../apis/rollouts/validation/validation_references.go 85.19% <0.00%> (+0.05%) ⬆️
utils/analysis/helpers.go 78.08% <0.00%> (+0.14%) ⬆️
analysis/analysis.go 86.09% <0.00%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2201d0...22b2e01. Read the comment docs.

@jessesuen jessesuen marked this pull request as ready for review January 12, 2022 19:13
@jessesuen jessesuen changed the title fix: improve basic canary approximation accuracy and honor maxSurge fix!: improve basic canary approximation accuracy and honor maxSurge Jan 12, 2022
@johnniee
Copy link

@jessesuen Thanks for all your help with this issue! Much appreciate that!
Hope we will find another way to contribute to the project some other day :)
In the meantime, you can add us to the USERS.md file :)
014fa4d

@jessesuen jessesuen force-pushed the canary-approximation branch from 6208ff4 to 22b2e01 Compare January 13, 2022 22:12
@sonarcloud
Copy link

sonarcloud bot commented Jan 13, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.5% 0.5% Duplication

@harikrongali
Copy link
Contributor

LGTM. 2 e2e tests are failing. I wanted to confirm if they are flaky or not. can you please retrigger e2e or confirm they are good from local if you have already validated them. Thanks

@jessesuen
Copy link
Member Author

can you please retrigger e2e or confirm they are good from local if you have already validated them. Thanks

it was due to flake. It actually passed in prev submission.

Copy link
Contributor

@alexmt alexmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jessesuen jessesuen merged commit 0f93f9b into argoproj:master Jan 14, 2022
@jessesuen jessesuen deleted the canary-approximation branch January 14, 2022 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Total replica count should be capped during a rollout
4 participants