Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Change completed condition to healthy condition and add a new completed condition #2147

Closed

Conversation

zachaller
Copy link
Collaborator

@zachaller zachaller commented Jul 18, 2022

closes #2103

Signed-off-by: zachaller [email protected]

@codecov
Copy link

codecov bot commented Jul 18, 2022

Codecov Report

Merging #2147 (ee78ef0) into master (c4205e8) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2147      +/-   ##
==========================================
+ Coverage   82.21%   82.22%   +0.01%     
==========================================
  Files         121      121              
  Lines       17865    17966     +101     
==========================================
+ Hits        14687    14772      +85     
- Misses       2413     2423      +10     
- Partials      765      771       +6     
Impacted Files Coverage Δ
rollout/sync.go 80.05% <100.00%> (+0.22%) ⬆️
utils/conditions/conditions.go 81.06% <100.00%> (+0.29%) ⬆️
analysis/controller.go 48.48% <0.00%> (-3.69%) ⬇️
rollout/restart.go 96.73% <0.00%> (-1.97%) ⬇️
rollout/trafficrouting/istio/controller.go 50.81% <0.00%> (-1.63%) ⬇️
experiments/controller.go 65.75% <0.00%> (-1.54%) ⬇️
rollout/trafficrouting.go 75.48% <0.00%> (-1.08%) ⬇️
rollout/controller.go 77.47% <0.00%> (-0.29%) ⬇️
controller/metrics/metrics.go 100.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2c1d8a...ee78ef0. Read the comment docs.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 18, 2022

Go Published Test Results

1 717 tests   1 717 ✔️  2m 32s ⏱️
   101 suites         0 💤
       1 files           0

Results for commit ee78ef0.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 18, 2022

E2E Tests Published Test Results

  1 files    1 suites   44m 59s ⏱️
88 tests 83 ✔️ 3 💤 2
90 runs  85 ✔️ 3 💤 2

For more details on these failures, see this check.

Results for commit ee78ef0.

♻️ This comment has been updated with latest results.

Signed-off-by: zachaller <[email protected]>
Signed-off-by: zachaller <[email protected]>
Signed-off-by: zachaller <[email protected]>
Signed-off-by: zachaller <[email protected]>
@zachaller zachaller marked this pull request as ready for review July 19, 2022 19:53
Signed-off-by: zachaller <[email protected]>
@zachaller zachaller added this to the v1.3 milestone Jul 22, 2022
@harikrongali harikrongali requested a review from jessesuen July 22, 2022 19:38
@harikrongali
Copy link
Contributor

lgtm. @jessesuen please review when you get a chance

Signed-off-by: zachaller <[email protected]>
@sonarcloud
Copy link

sonarcloud bot commented Jul 27, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
2.2% 2.2% Duplication

Copy link
Member

@jessesuen jessesuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are introducing new condition for Healthy vs. Completed I feel like we need to also introduce and emit the corresponding Kubernetes event for when a Rollout completes an update.

This also means we won't technically break backwards compatibility for users who are looking for that event.

Comment on lines +982 to 985
// RolloutHealthy means that rollout is in a completed state. It is still progressing at this point.
RolloutHealthy RolloutConditionType = "Healthy"
// RolloutCompleted means that rollout is in a completed state. It is still progressing at this point.
RolloutCompleted RolloutConditionType = "Completed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need to fix the descriptions. I suggest:

	// RolloutHealthy indicates that the rollout is fully updated and all of its pods are available. A Healthy rollout implies that it is also Completed (but not vice versa).
	RolloutHealthy RolloutConditionType = "Healthy"
	// RolloutCompleted indicates that the rollout has completed its update to the desired version. A rollout can be both Completed as well as Degraded/Progressing (e.g. if its Pods are unavailable sometime after a completed update).
	RolloutCompleted RolloutConditionType = "Completed"

@@ -107,7 +107,7 @@ spec:
"RolloutStepCompleted", // Rollout step 2/2 completed (pause: 3s)
"RolloutResumed", // Rollout is resumed
"ScalingReplicaSet", // Scaled down ReplicaSet abort-retry-promote-698fbfb9dc (revision 1) from 1 to 0
"RolloutCompleted", // Rollout completed update to revision 2 (75dcb5ddd6): Completed all 2 canary steps
"RolloutHealthy", // Rollout healthy update to revision 2 (75dcb5ddd6): Completed all 2 canary steps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we see the RolloutCompleted in the events?

Comment on lines +67 to +71
// RolloutHealthyReason is added in a rollout when it is healthy.
RolloutHealthyReason = "RolloutHealthy"
// RolloutHealthyMessage is added when the rollout is healthy
RolloutHealthyMessage = "Rollout healthy update to revision %d (%s): %s"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including revision number doesn't make sense anymore for when a Rollout becomes Healthy. I feel we need to emit two events, one for each condition (Healthy vs. Completed). Users will care separately when a Rollout finishes an update, but they will also care about from when a Rollout degrades (e.g. because of a pod restart) and subsequently recovers.

Copy link
Collaborator Author

@zachaller zachaller Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place this is used is here https://github.com/argoproj/argo-rollouts/pull/2147/files/ee78ef0b85d3af3b21cd533930dc10b35b70c78d#diff-594206431376a15241cfdd95572e5d6e586929f867ca28ff63d1f679c0334d77R936-R938 which I added a TODO as well because I did not think it was the correct event to emit. I will look at removing that message and changing that event to something that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. I think that part of the code should not change and we should continue to emit RolloutCompletedReason when we switch stable hash to current hash. Not that the rollout is Healthy.

@zachaller zachaller marked this pull request as draft July 28, 2022 16:16
@zachaller zachaller closed this Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Completed Event/Condition should only trigger during updates
3 participants