Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocatable_memory wrong #910

Open
hsmade opened this issue May 31, 2024 · 2 comments
Open

allocatable_memory wrong #910

hsmade opened this issue May 31, 2024 · 2 comments

Comments

@hsmade
Copy link

hsmade commented May 31, 2024

internal_plugin.nomad-apm: collected node pool resource data: allocated_cpu=2924 allocated_memory=5460 allocatable_cpu=35200 allocatable_memory=15608

While I have 3 VMs in my asg, with each 2G of memory.
I do have a lot of clients in 'ready' state according to Nomad, but these are all no longer there and are ineligible.

# nomad node-status
ID        Node Pool  DC            Name                                 Class            Drain  Eligibility  Status
54c1d969  default    eu-central-1  worker-services-i-0b39e96bb16ae1c7d  worker-services  false  ineligible   ready
99a91f0c  default    eu-central-1  worker-services-i-0d3bac19ef5aa1d1d  worker-services  false  ineligible   ready
4c394a12  default    eu-central-1  worker-services-i-0582f925f917ec73e  worker-services  false  ineligible   ready
284cc4be  default    eu-central-1  worker-services-i-0dfe7da7ebadd971e  worker-services  false  ineligible   ready
d90e705b  default    eu-central-1  worker-services-i-0a7a4b92ab8d432f7  worker-services  false  eligible     ready
e57184ee  default    eu-central-1  worker-services-i-01e9192ae978c7921  worker-services  false  ineligible   down
1e1c3d34  default    eu-central-1  worker-services-i-03cb071c0352926fe  worker-services  false  ineligible   down
db380668  default    eu-central-1  worker-services-i-007763c53c9ad360c  worker-services  false  eligible     ready
25860043  default    eu-central-1  worker-services-i-0d62b5c0d3689f418  worker-services  false  ineligible   down
d91509f1  default    eu-central-1  worker-services-i-09962f55c4c6d43d7  worker-services  false  ineligible   ready
8e9906f3  default    eu-central-1  worker-services-i-002097b734ca1e620  worker-services  false  ineligible   down
6f615722  default    eu-central-1  worker-services-i-030899c1a2dc0cada  worker-services  false  ineligible   down
24e0a7dc  default    eu-central-1  worker-services-i-0d5f1e86c467bfb34  worker-services  false  ineligible   down
f1af8cee  default    eu-central-1  worker-services-i-02f8f7ce032068ce2  worker-services  false  eligible     ready
ca0f312f  default    eu-central-1  worker-services-i-0df38a7f1dae1914a  worker-services  false  ineligible   down
# nomad node-status 54c1d969

error fetching node stats: Unexpected response code: 404 (No path to node)
ID              = 54c1d969-1409-a1af-bbe8-4af4b67deb3d
Name            = worker-services-i-0b39e96bb16ae1c7d
Node Pool       = default
Class           = worker-services
DC              = eu-central-1
Drain           = false
Eligibility     = ineligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Host Volumes    = <none>
Host Networks   = <none>
CSI Volumes     = <none>
Driver Status   = docker,exec

Node Events
Time                  Subsystem  Message
2024-05-31T14:03:38Z  Drain      Node drain complete
2024-05-31T14:03:36Z  Drain      Node drain complete
2024-05-31T14:03:36Z  Drain      Node drain strategy set
2024-05-31T14:03:19Z  Drain      Node drain complete
2024-05-31T14:03:19Z  Drain      Node drain strategy set
2024-05-31T13:59:33Z  Cluster    Node registered

Allocated Resources
CPU         Memory       Disk
0/4400 MHz  0 B/1.9 GiB  0 B/15 GiB

Allocation Resource Utilization
CPU         Memory
0/4400 MHz  0 B/1.9 GiB

error fetching node stats: actual resource usage not present

Because of this, the autoscaler tries to scale in even more, until it hits my min-limit.

@hsmade
Copy link
Author

hsmade commented Aug 7, 2024

might be related to hashicorp/nomad#13549

@jrasell
Copy link
Member

jrasell commented Sep 18, 2024

Hi @hsmade and thanks for raising this issue. It looks like we do not check a nodes eligibility when filtering nodes which is used by the Nomad APM to calculate resource totals. I think adding a conditional here, to ensure the node is "eligible" would resolve this problem.

@jrasell jrasell moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants