-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for blocked eval resources #10454
Conversation
e36120c
to
932e50d
Compare
932e50d
to
8f5b20c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach LGTM, and it'll be nice to have a place to hook in more metrics around scheduler decisions so 👍 on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
When an eval is blocked, it's often because of insufficient resources available. Nomad currently tracks now many evals are blocked, but it has not indication of how much resource is required to unblock them, or how where they are assigned to run.
This PR adds a new set of metrics that emit how much resource have been requested by blocked evals. The metrics are split into jobs, datacenters and node classes.
If a job spans multiple datacenters or node classes, they will all be updated, since adding more resources to any of them would be enough to unblock the eval.
This metric is emitted by the leader, since it is the only node in the cluster that has the proper scheduling information.
Missing work: