-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Initial Prometheus Metrics #47
Conversation
Codecov Report
@@ Coverage Diff @@
## master #47 +/- ##
=========================================
- Coverage 75.76% 75.56% -0.2%
=========================================
Files 13 14 +1
Lines 1304 1404 +100
=========================================
+ Hits 988 1061 +73
- Misses 239 265 +26
- Partials 77 78 +1
Continue to review full report at Codecov.
|
6ae22c7
to
b52044c
Compare
controller/bluegreen.go
Outdated
@@ -20,73 +21,73 @@ func (c *Controller) rolloutBlueGreen(r *v1alpha1.Rollout, rsList []*appsv1.Repl | |||
logCtx := logutil.WithRollout(r) | |||
newRS, oldRSs, err := c.getAllReplicaSetsAndSyncRevision(r, rsList, true) | |||
if err != nil { | |||
return err | |||
return c.metricsServer.IncError(r, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All errors eventually bubble up to https://github.com/argoproj/argo-rollouts/blob/master/controller/controller.go#L223 . I think it makes sense to report metrics in one place instead of spreading it.
if err := c.syncHandler(key); err != nil {
// Put the item back on the workqueue to handle any transient errors.
c.workqueue.AddRateLimited(key)
c.metricsServer.IncError(r, err)
return fmt.Errorf("error syncing '%s': %s, requeuing", key, err.Error())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that makes sense. One side effort of this approach is that I will need to add an extra counter for reconciliation errors because I don't have the rollout variable to get the required strategy label at https://github.com/argoproj/argo-rollouts/blob/master/controller/controller.go#L223. I could use the informer to get the rollout, but that feels like the controller is doing too much work during the error handling. However, I don't think that side effort is bad though. The new counter will only be missing the strategy field, and in this case, the operator will need to check the controller's logs for more information. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both approaches are acceptable. Please choose one which you like the best.
controller/sync.go
Outdated
@@ -346,7 +347,8 @@ func CreateTwoWayMergePatch(orig, new, dataStruct interface{}) ([]byte, bool, er | |||
} | |||
|
|||
// persistRolloutStatus persists updates to rollout status. If no changes were made, it is a no-op | |||
func (c *Controller) persistRolloutStatus(orig *v1alpha1.Rollout, newStatus *v1alpha1.RolloutStatus, newPause *bool) error { | |||
func (c *Controller) persistRolloutStatus(orig *v1alpha1.Rollout, newStatus *v1alpha1.RolloutStatus, newPause *bool, phase metrics.ReconcilePhase) error { | |||
c.metricsServer.IncPhase(orig, phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RolloutStatus
has or supposed to have all information which describes current state of rollout. Instead of imperatively set it in code we can infer the state within persistRolloutStatus
method. Most likely you can check InvalidSpec
and Available
condition and make decision about current phase. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
b52044c
to
f9707e8
Compare
f9707e8
to
cd518fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Addresses #29 by introducing a framework to expose Prothemeus metrics. Adds metrics for the following:
This PR did not include metrics on switching services and incrementing step index counters. This will be incorporated with the work done with #32