Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric label node_scheduling_availability does not update with changes. #8965

Closed
angrycub opened this issue Sep 25, 2020 · 6 comments · Fixed by #14125 or #14483
Closed

Metric label node_scheduling_availability does not update with changes. #8965

angrycub opened this issue Sep 25, 2020 · 6 comments · Fixed by #14125 or #14483

Comments

@angrycub
Copy link
Contributor

Nomad version

Nomad v0.12.5

Issue

The node_scheduling_eligibility label on the Nomad client metrics does not update as the eligibility status is changed.

Reproduction steps

Create a telemetry.nomad file with the basic configuration

telemetry {
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

Run a dev agent with the telemetry configuration

nomad agent -dev -config=telemetry.nomad

Watch a metric with the label

watch "curl -s http://localhost:4646/v1/metrics | jq '.Gauges[] | select(.Name==\"nomad.client.uptime\")'"

Note that it appears enabled. Set the eligibility to disabled

nomad node eligibility -self -disable

Watch for a stat update, observe that enabled doesn't change.

@tgross
Copy link
Member

tgross commented Sep 28, 2020

Note: this was originally reported internally on v0.11.1+ent. Preliminary theory is that the metrics added in #6130 are treating the client config as authoritative for the metrics value, but that the client doesn't necessarily have the correct view of the world for scheduling eligibility.

cc @pete-woods as this may be silently impacting dashboards in your org.

@parberge
Copy link

parberge commented Mar 3, 2021

We have observed the same thing running with version v1.0.4. Happy to provide more information if required.

@gmichalec-pandora
Copy link

Just an update that this is still present on the latest v1.1.0-beta1:

gmichalec@gmichalec:~$ curl -s http://dc6-docker5:4646/v1/nodes | jq -r '.[] | select(.Name=="dc6-docker5") | .SchedulingEligibility'
ineligible
gmichalec@gmichalec:~$ curl -s http://dc6-docker5:4646/v1/metrics | jq -r '.Gauges[] | select(.Name == "nomad.client.uptime") | select(.Labels.host == "dc6-docker5") | .Labels.node_scheduling_eligibility'
eligible

@usovamaria
Copy link

The problem is still reproducing also using nomad node status -self. The node drain status is not updated after drain:

root@<host>:<dir># nomad node status -self -short -json 2>/dev/null | jq .Drain
false
root@<host>: <dir># nomad node drain -self -enable -deadline 120s -detach
root@<host>:<dir># nomad node status -self -short -json 2>/dev/null | jq .Drain
false

@resmo
Copy link
Contributor

resmo commented Apr 9, 2022

We see this in v1.2.6

@github-actions
Copy link

github-actions bot commented Jan 7, 2023

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
7 participants