Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Initial Prometheus Metrics #47

Merged
merged 1 commit into from
Mar 27, 2019
Merged

Add Initial Prometheus Metrics #47

merged 1 commit into from
Mar 27, 2019

Conversation

dthomson25
Copy link
Member

Addresses #29 by introducing a framework to expose Prothemeus metrics. Adds metrics for the following:

  • Reconciliation times for individual rollouts
  • Reconciliation Error counters
  • Invalid Spec counters
  • Rollout creation time
  • General Golang metrics

This PR did not include metrics on switching services and incrementing step index counters. This will be incorporated with the work done with #32

@codecov-io
Copy link

codecov-io commented Mar 25, 2019

Codecov Report

Merging #47 into master will decrease coverage by 0.19%.
The diff coverage is 71.69%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #47     +/-   ##
=========================================
- Coverage   75.76%   75.56%   -0.2%     
=========================================
  Files          13       14      +1     
  Lines        1304     1404    +100     
=========================================
+ Hits          988     1061     +73     
- Misses        239      265     +26     
- Partials       77       78      +1
Impacted Files Coverage Δ
controller/bluegreen.go 75.42% <ø> (ø) ⬆️
utils/defaults/defaults.go 100% <100%> (ø) ⬆️
controller/sync.go 66.66% <100%> (+0.15%) ⬆️
controller/controller.go 44.94% <36.36%> (-1.36%) ⬇️
controller/metrics/metrics.go 79.22% <79.22%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8c9c38...cd518fc. Read the comment docs.

@dthomson25 dthomson25 force-pushed the prometheus-metrics branch 3 times, most recently from 6ae22c7 to b52044c Compare March 26, 2019 20:48
@dthomson25
Copy link
Member Author

@@ -20,73 +21,73 @@ func (c *Controller) rolloutBlueGreen(r *v1alpha1.Rollout, rsList []*appsv1.Repl
logCtx := logutil.WithRollout(r)
newRS, oldRSs, err := c.getAllReplicaSetsAndSyncRevision(r, rsList, true)
if err != nil {
return err
return c.metricsServer.IncError(r, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All errors eventually bubble up to https://github.com/argoproj/argo-rollouts/blob/master/controller/controller.go#L223 . I think it makes sense to report metrics in one place instead of spreading it.

if err := c.syncHandler(key); err != nil {
			// Put the item back on the workqueue to handle any transient errors.
			c.workqueue.AddRateLimited(key)
			c.metricsServer.IncError(r, err)
			return fmt.Errorf("error syncing '%s': %s, requeuing", key, err.Error())
		}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that makes sense. One side effort of this approach is that I will need to add an extra counter for reconciliation errors because I don't have the rollout variable to get the required strategy label at https://github.com/argoproj/argo-rollouts/blob/master/controller/controller.go#L223. I could use the informer to get the rollout, but that feels like the controller is doing too much work during the error handling. However, I don't think that side effort is bad though. The new counter will only be missing the strategy field, and in this case, the operator will need to check the controller's logs for more information. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both approaches are acceptable. Please choose one which you like the best.

@@ -346,7 +347,8 @@ func CreateTwoWayMergePatch(orig, new, dataStruct interface{}) ([]byte, bool, er
}

// persistRolloutStatus persists updates to rollout status. If no changes were made, it is a no-op
func (c *Controller) persistRolloutStatus(orig *v1alpha1.Rollout, newStatus *v1alpha1.RolloutStatus, newPause *bool) error {
func (c *Controller) persistRolloutStatus(orig *v1alpha1.Rollout, newStatus *v1alpha1.RolloutStatus, newPause *bool, phase metrics.ReconcilePhase) error {
c.metricsServer.IncPhase(orig, phase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RolloutStatus has or supposed to have all information which describes current state of rollout. Instead of imperatively set it in code we can infer the state within persistRolloutStatus method. Most likely you can check InvalidSpec and Available condition and make decision about current phase. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot more. The only issue is that the Available condition needs to be cleaned up (as per #41), but I think it makes sense to go start with this implementation and work on that #41 afterward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor

@alexmt alexmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dthomson25 dthomson25 merged commit e0c77c0 into master Mar 27, 2019
@dthomson25 dthomson25 deleted the prometheus-metrics branch March 27, 2019 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants