-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metric to track Unhealthy Stale machine removal count #808
Conversation
/area monitoring Here's a capture of the exported metric visibility : cc.: @dmahmalatsap @istvanballok @rickardsjp @himanshu-kun @rishabh-11 |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are many worker go-routines which reconcile the machineSets , and the current metric is per machineSet
So I don't feel this is a thread safe way of keeping count , I'll have to discuss internally and research and only then I can comment on that.
otherwise I have requested some changes.
771fe34
to
824c3a2
Compare
nice work. |
Thanks for the refactor and for keeping us in mind in the process ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. thanks for the changes.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
approving my own contribution to the PR :')
Note: Since this metric is expose by MCM core, so it doesn't require to be vendored in mcm-providers to start exposing the metrics and will be available in next MCM release (v0.50.0) |
What this PR does / why we need it:
There are two kinds of stale machine which
machineSet controller
collectsIn this metric , we are interested to expose the count of
stale due to unhealthiness
machine which get terminated between two scrapings by prometheus.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
I have tested the code for different scenarios like:
Release note: