-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Add additional operational excellence metrics #156
Conversation
9dd9b29
to
2d09a1f
Compare
Signed-off-by: chandankumar4 <[email protected]>
2d09a1f
to
dbe042d
Compare
Signed-off-by: chandankumar4 <[email protected]>
why this doesn't include the total number of ISB service synced, etc? |
@@ -84,19 +85,24 @@ func NewNumaflowControllerRolloutReconciler( | |||
kubectl kubeUtil.Kubectl, | |||
customMetrics *metrics.CustomMetrics, | |||
) (*NumaflowControllerRolloutReconciler, error) { | |||
stateCache := sync.NewLiveStateCache(rawConfig) | |||
stateCache := sync.NewLiveStateCache(rawConfig, customMetrics) | |||
newRawConfig := metrics.AddMetricsTransportWrapper(customMetrics, rawConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we call this function here as opposed to from main.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now I see in the issue it says we're counting these only for NumaflowControllerRollout, but i'm not sure why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numaflow Controller is through gitops sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that we need to hold up this PR for this, but why do we only want to count "Number of kubernetes requests executed during reconciliation" for Numaflow Controller and not for ISBService and Pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we would want all, @chandankumar4 can you create an follow up issue to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have covered it in this PR
for any metrics that are particular to Numaflow Controller can we have "numaflow_controller" or "numaflow_ctlr" in the name? |
Why is |
Fixed, It was missed register that variable. @xdevxy |
41742d5
to
3324bae
Compare
|
Signed-off-by: chandankumar4 <[email protected]>
3324bae
to
5b06851
Compare
Signed-off-by: chandankumar4 <[email protected]>
I think we initially thought |
@@ -23,6 +23,7 @@ require ( | |||
) | |||
|
|||
require ( | |||
github.com/argoproj/pkg v0.13.7-0.20230626144333-d56162821bd1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using it here if we don't wanted to use then need to copy that fn here with their dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do the open comments in follow ups.
return ctrl.Result{}, statusUpdateErr | ||
} | ||
|
||
r.customMetrics.ISBServicesSyncFailed.WithLabelValues().Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be done just once above if statusUpdateErr != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For having it only once, how It will cover if statusUpdateErr!=nil and err != nil, it should be called twice?
@@ -317,6 +327,7 @@ func (r *ISBServiceRolloutReconciler) processExistingISBService(ctx context.Cont | |||
} | |||
|
|||
isbServiceRollout.Status.MarkDeployed(isbServiceRollout.Generation) | |||
r.customMetrics.ReconciliationDuration.WithLabelValues(ControllerISBSVCRollout, "update").Observe(time.Since(syncStartTime).Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, we don't know that this is an "update" vs a "no-op", do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, if isbServiceNeedsUpdating {...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sorry, but what do you think about enclosing it in the else
block that's directly above it? I think then it's very clear that this statement goes with the dataLossPrevention=false
case only and there will be less chance of messing it up later. Otherwise, I had to look closely at the dataLossPrevention=true
code above to confirm that it was in fact returning and would't hit this.
Signed-off-by: chandankumar4 <[email protected]>
btw I have already remove the namespace label from metrics but do we also wanted to revert this change and allow multiple numaplane-controller in one namespace? |
0518436
to
fe59610
Compare
I'm kind of thinking about in a future PR, I wonder if we should consider avoiding doing metrics in the inner code and trying to as much as possible do them in one place, like in |
Signed-off-by: chandankumar4 <[email protected]>
fe59610
to
c9be19a
Compare
Right, we may need to revert that change in future, but if you remove the namespace label in this PR then no need to open an issue for it. The revert will be tracked by no downtime upgrade feature, Thanks! |
Fixes #97
Modifications
Add additional operational excellence metrics
metrics data
Verification