-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[observability] create an alert when file descriptors exhausted #12935
Comments
I tried different views of the metrics However, view with the sum of We could consider sending alerts if the sum of Any suggestion to make the alerts send out as soon as possible? @kylos101 |
@jenting can you share the query you found most promising? |
👋 hey @jenting , have you tried an alert like this? https://www.robustperception.io/alerting-on-approaching-open-file-limits/ |
Hey @jenting , I recommend trying to write an alert that is node or workspace based, rather than cluster based. |
For the node-based metric If we write an alert that is node-based, and the alert criteria is the current file descriptors / total file descriptors, we can see that the current file descriptors is far far away to total file descriptors. The grafana query. Therefore, we can't use the criteria |
@jenting This incident was caused by ws-manager with PVC after all, right? So I don't think the alert really needs to be issued until just before the fd of the node is depleted, which is about 80% of the time. What do you think? |
I agree with you. We could put the threshold 80% and I did check our overall fd usage, we are far away from 80%. (If I remember correctly, the fd usage is under 10%). |
The problem with the fd of the supervisor should have been a side issue and not the root cause. |
Yes, I think the ws-manager failed to handle any pod event. |
@utam0k if we no longer need an alert, please close this issue as not planned? @jenting is there a separate issue that needs to be created to solve |
@utam0k if we no longer need an alert, please close this issue as not planned? |
@jenting is there a separate issue that needs to be created to solve s-manager failed to handle any pod event.? If yes, can you share if this is related to PVC or in general? I ask to limit scope, so we can focus on closing this issue (either by creating an alert or losing this because we don't need an alert, and making a separate issue to track if needed). |
No, we don't need to create a new issue to solve ws-manager failed to handle any pod event. Let's link to the culprit issue #13007 and close this one. |
Okay, thanks! I will close this issue as won't fix. |
Is your feature request related to a problem? Please describe
create an alert when file descriptors exhausted
Describe the behaviour you'd like
Having an alert when the number of file descriptors exhausted
Describe alternatives you've considered
None
Additional context
https://gitpod.slack.com/archives/C04245JPHKL/p1663083593170859
The text was updated successfully, but these errors were encountered: