Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add edge case handling to traffic routing #1190

Merged
merged 4 commits into from
May 18, 2021
Merged

fix: Add edge case handling to traffic routing #1190

merged 4 commits into from
May 18, 2021

Conversation

zezaeoh
Copy link
Contributor

@zezaeoh zezaeoh commented May 17, 2021

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@zezaeoh zezaeoh changed the title fix(controller): Add edge case handling to traffic routing fix: Add edge case handling to traffic routing May 17, 2021
Comment on lines 101 to 104
} else if c.pauseContext.IsAborted() {
// when promote aborted. desired canary weight should be 0
} else if c.newRS.Status.AvailableReplicas == 0 {
// when newRS available replicas num is 0. never weight to canary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is significant, and a surprising oversight if we are not handling this correctly. I will take some time to understand the implications of the change.

@jessesuen
Copy link
Member

Looks like we have a few tests (both unit and e2e) that are asserting the opposite of this change. So at the very least the existing tests would need to change.

But we would also want new tests to verify this change. This is assuming the change is correct -- I still need to play around and what the behavior should be vs. what is actually happening.

@sonarcloud
Copy link

sonarcloud bot commented May 18, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@zezaeoh
Copy link
Contributor Author

zezaeoh commented May 18, 2021

Hi, @jessesuen
Thanks a lot for your review.

I would like to cover next cases with this changes

  1. abort case
	} else if c.pauseContext.IsAborted() {		
		// when promote aborted. desired canary weight should be 0	
	} ...

when promotion is aborted, controller scale down canary replicas before change canary weight to 0
because of this logic

                atDesiredReplicaCount := replicasetutil.AtDesiredReplicaCountsForCanary(c.rollout, c.newRS, c.stableRS, c.otherRSs)
		if !atDesiredReplicaCount {
			// Use the previous weight since the new RS is not ready for a new weight
			for i := *index - 1; i >= 0; i-- {
				step := c.rollout.Spec.Strategy.Canary.Steps[i]
				if step.SetWeight != nil {
					desiredWeight = *step.SetWeight
					break
				}
			}
		} ...

this behavior leads some requests drop

  1. when promotion aborted and then start promotefull action
	} else if c.newRS == nil || c.newRS.Status.AvailableReplicas == 0 {
		// when newRS available replicas num is 0. never weight to canary
	} ...

Suppose we have a rollout resource with a canary strategy like this:

    canary:
      steps:
        - setWeight: 20
        - pause: {}
        - setWeight: 40
        - pause: {}
        - setWeight: 60
        - pause: {}
        - setWeight: 100
        - pause: {}

when promotion is aborted, then canary service's endpoint will be empty.
If promotefull action is executed in this situation,
despite the fact that newRS has just been created, trafficrouting sets the canary weight to 100

this behavior leads requests drop

@zezaeoh zezaeoh requested a review from jessesuen May 18, 2021 05:26
@codecov
Copy link

codecov bot commented May 18, 2021

Codecov Report

Merging #1190 (3e1f571) into master (9c1a746) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1190      +/-   ##
==========================================
+ Coverage   81.01%   81.05%   +0.03%     
==========================================
  Files         105      105              
  Lines        9404     9459      +55     
==========================================
+ Hits         7619     7667      +48     
- Misses       1265     1270       +5     
- Partials      520      522       +2     
Impacted Files Coverage Δ
rollout/trafficrouting.go 88.46% <100.00%> (+0.30%) ⬆️
rollout/trafficrouting/ambassador/ambassador.go 83.58% <100.00%> (+1.62%) ⬆️
pkg/kubectl-argo-rollouts/cmd/lint/lint.go 67.24% <0.00%> (-2.53%) ⬇️
rollout/controller.go 75.44% <0.00%> (-0.55%) ⬇️
utils/conditions/conditions.go 78.94% <0.00%> (ø)
experiments/replicaset.go 82.67% <0.00%> (+0.13%) ⬆️
utils/record/record.go 97.61% <0.00%> (+0.56%) ⬆️
rollout/sync.go 75.90% <0.00%> (+0.59%) ⬆️
rollout/trafficrouting/istio/controller.go 45.73% <0.00%> (+2.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c1a746...3e1f571. Read the comment docs.

@jessesuen
Copy link
Member

Thanks for the explanation. The change makes sense now and looks like all tests are passing.

@jessesuen jessesuen merged commit ed39758 into argoproj:master May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants