-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for global circuit budget #6013
Conversation
Solves projectcontour#6001 Signed-off-by: Sotiris Nanopoulos <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #6013 +/- ##
==========================================
+ Coverage 78.70% 78.77% +0.07%
==========================================
Files 138 138
Lines 19717 19745 +28
==========================================
+ Hits 15518 15555 +37
+ Misses 3895 3887 -8
+ Partials 304 303 -1
|
Signed-off-by: Sotiris Nanopoulos <[email protected]>
removing @clayton-gonsalves since he is on PAT leave (if you see this feel free to review anyway). Adding @shadialtarsha and @seth-epps as well |
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code makes sense to me.
I have 2 questions:
- What will happen if not all
GlobalCircuitBreakerDefaults
attributes are defined? I think they will have the value 0 and we will default to Envoy settings again which makes sense but I would clarify this in the doc. - Wondering if we can write e2e tests for this feature. I know it might be tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @davinci26! Overall this makes sense to me, added some comments.
internal/metrics/metrics.go
Outdated
@@ -87,6 +88,8 @@ const ( | |||
statusUpdateConflict = "contour_status_update_conflict_total" | |||
statusUpdateNoop = "contour_status_update_noop_total" | |||
statusUpdateDurationSeconds = "contour_status_update_duration_seconds" | |||
|
|||
ContourCircuitBreakerSettings = "contour_dag_circuit_breaker_settings" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you want to expose this as a metric? We don't really have any other metrics like this, so it's a little unusual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you want to expose this as a metric?
We want to alert and build graphs on this as a defence in depth. We have discovered during a deployment that we accidentally deleted those annotations and that caused issues and that was really painful to debug.
Basically it helps with the discoverability of what people are setting across the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would expect something like this to be exposed via resource Status or something similar rather than a Contour xDS server metric
Also I believe we have an issue/PR to expand on cluster circuit breaker metrics: https://www.envoyproxy.io/docs/envoy/latest/configuration/upstream/cluster_manager/cluster_stats#circuit-breakers-statistics (here: #5950)
I don't want to hold up the rest of the change, so maybe lets pull the metrics part out into another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would expect something like this to be exposed via resource Status or something similar rather than a Contour xDS server metric
Personally I take a more liberal approach to metrics in the sense that, if contour
makes the decision to set a specific value it would be useful for operators to know the decision at runtime, especially if it affect things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to hold up the rest of the change, so maybe lets pull the metrics part out into another PR?
I won't die on this hill :D, So I can separate the changes into two PRs. But my general take is that adding the metric is pretty "cheap" and overall useful. I.e. there have been cases where I wish I had this metric tracked.
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
@skriss this is ready for another review round |
@skriss unless there is an embarrassing bug somewhere this is ready |
@skriss maybe we close/merge this today before the holiday break? |
Looks like we got some merge conflicts, if you can resolve those I'll take another look |
Signed-off-by: Sotiris Nanopoulos <[email protected]>
@skriss merge conflicts resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, I'm approving with comments that need to be addressed but can be done in a follow-up PR if preferred. Will leave for another maintainer to take a look as well. Thanks @davinci26!
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
@projectcontour/maintainers can someone PTAL as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think everything LGTM minus the metric part, I wonder if we can pull that out into a separate PR so we can discuss a little more?
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Will do it in a separate PR |
there is more cleanup to be made |
Signed-off-by: Sotiris Nanopoulos <[email protected]>
I think I got all the references now |
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Also, solves #1192. Doesn't it? |
Solves #6001