-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-21046: Alerts for tenant nearing OOM #172
Conversation
Have you checked how close we are too hitting these alerts on prod right now? I've observed high memory usage on scanner pods in particular. See also https://redhat-internal.slack.com/archives/C0313JYKH8W/p1700683356311839?thread_ts=1700679380.899679&cid=C0313JYKH8W. I'm afraid this will be a very busy alert as is. |
- name: tenant-resources | ||
rules: | ||
- expr: | | ||
sum(container_memory_max_usage_bytes{namespace=~"rhacs-.{20}",container!="POD",container!=""}) by (namespace, container, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum(container_memory_max_usage_bytes{namespace=~"rhacs-.{20}",container!="POD",container!=""}) by (namespace, container, pod) | |
sum(container_memory_working_set_bytes{namespace=~"rhacs-.{20}",container!="POD",container!=""}) by (namespace, container, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I've checked if we are hitting the alerts, and we are only for 1 central instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that means the alert will fire immediately? I think we need to make sure this alert is not noisy, so we need a memory buffer on all Centrals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added resources to the 1 central that would've triggered the alert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep an eye on this alert and tighten it if it gets noisy.
Adding alerts for tenant containers that are about to oom