-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node_scheduling_eligibility metric is not correct #13549
Comments
Hi @Netlims! I've verified this is broken just as you've said. This code was originally introduced in #6130 and was revised slightly #8925 but honestly now that I'm looking at it with some distance I'm not sure it ever worked architecturally. Setting scheduler ineligibility gets set in the server's view of the node but we never push that information back to the client on its next heartbeat. We do push a tiny bit of data back about the cluster on each heartbeat in In any case this is a clear bug without a good workaround. That metric is misleading without a fix to send the ineligibility data back to the client (or removing the incorrect label). I'll mark this for roadmapping. Thanks for opening the issue @Netlims! |
I can confirm it worked because we had some grafana reports based on this. I was wondering since when and why they suddenly show all CPUs as eligible :) Edit: we were using influx + telegraf for metric collection. Edit2: please don't remove this tag as it's really useful for automated cluster scaling. |
Hi. |
any ETA to fix this? since this good for the nomad-autoscaler metrics filtering |
Same here, it seems not to be working for us either. We are using Prometheus as our APM. Would be awesome to have it working for autoscaling purposes as @kholisrag mentioned. |
Hi,
I'm trying to sample some metrics from the active nodes in the cluster, for that I'm using
node_scheduling_eligibility
as a filter.I tried marking two nodes as
ineligible
but on their metrics, they send theirnode_scheduling_eligibility
aseligible
.nomad node status
:From one of the ineligible nodes:
localhost:4646/v1/metrics?format=prometheus
:The issue appears also when the metrics are not formatted for prometheus.
Here is the
telemetry
block from the agent:I'm using Nomad 1.2.6
Please tell me if you need anything else.
Thank you
The text was updated successfully, but these errors were encountered: