Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to provision EC2 node even though matching EC2 node type exists #1212

Closed
kamialie opened this issue Apr 25, 2024 · 11 comments
Closed
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/closed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@kamialie
Copy link

kamialie commented Apr 25, 2024

Description

Observed Behavior:

Karpenter can not schedule the pod claiming that scheduling requirements were not met.

The pod should run on i3en.24xlarge EC2 node, and as far as I can tell the resource requests satisfy the resources available on this node type (including daemonset pods overhead). If i3en.24xlarge node is already running in the cluster Kubernetes scheduler successfully assigns this pod to run on that node, the only problem is to get Karpenter to provision it. One way I was able to do that is to scale up another deployment, where CPU requests is set to 40, thus, 2 Pods force Karpenter to provision the node I need, so that Kubernetes scheduler can then assign my initial pod to it.

Setting Karpenter log level to debug unfortunately didn't reveal anything at all (I have just set logLevel Helm chart property to debug, is there some other place to increase it even more?). Here is what Karpenter says at the moment:

incompatible with nodepool \"example\", daemonset overhead={\"cpu\":\"275m\",\"memory\":\"360Mi\",\"pods\":\"7\"}, no instance type satisfied resources {\"cpu\":\"80275m\",\"memory\":\"737640Mi\",\"pods\":\"8\"} and requirements karpenter.k8s.aws/instance-family In [i3en], karpenter.k8s.aws/instance-size NotIn [metal], karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [example], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], role In [example], topology.kubernetes.io/zone In [eu-central-1a eu-central-1b eu-central-1c] (no instance type which had enough resources and the required offering met the scheduling requirements)

Overall it looks like that something is wrong with how Karpenter calculates memory requirements, or maybe the problem is on AWS side - whatever it returns to Karpenter when it tries to fetch information on available node types, thus, the issue could be related to Karpenter or AWS provider. Maybe the reason lies in the fact that when such node is running, Kubernetes reports that it consumes 97%, and there is some kind of a threshold that Karpenter doesn't want to break? I would still expect Karpenter to be able to provision a node, since scheduler didn't have problems to assing that pod.

Expected Behavior:

Karpenter identifies matching node and provisions it.

Reproduction Steps (Please include YAML):

Karpenter node definition:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: example
spec:
  amiFamily: AL2
  instanceProfile: KarpenterNode
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: dev
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: dev
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: example
spec:
  disruption:
    consolidateAfter: 15m
    consolidationPolicy: WhenEmpty
    expireAfter: 336h
  template:
    metadata:
      labels:
        role: example
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: example
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - eu-central-1a
        - eu-central-1b
        - eu-central-1c
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
        - i3en
      - key: karpenter.k8s.aws/instance-size
        operator: NotIn
        values:
        - metal

Sample pod:

apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  nodeSelector:
    role: example
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: nginx
    resources:
      requests:
        cpu: "80"
        memory: 720Gi

Versions:

  • Chart Version: 0.36.0
  • Kubernetes Version (kubectl version):
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3-eks-adc7111
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@kamialie kamialie added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 25, 2024
@thehandsomezebra
Copy link

thehandsomezebra commented Apr 25, 2024

I am currently experiencing the exact same issue - currently attempting to troubleshoot this with AWS now. We are running chart version 0.35.0

Edit: We are using amiFamily: Bottlerocket && Spot + On Demand instances.

@kamialie
Copy link
Author

Right, I didn't mention that our setup just recently upgraded from k8s 1.27 and Karpenter 0.28.1, so this exists for quite some time.

@engedaam
Copy link
Contributor

engedaam commented May 3, 2024

You might need to adjust the VM_MEMORY_OVERHEAD_PERCENT, currently karpenter will take a set percent of the memory off the top for all instance types of 7.5% https://karpenter.sh/docs/reference/settings/. On the EC2 side there is an PR to dynamically add the memory overhead more dynamically for each instance type. aws/karpenter-provider-aws#4517. I recommended moving this issue the AWS provider repo https://github.com/aws/karpenter-provider-aws. Also, this issue might also help better help in scoping this behavior down aws/karpenter-provider-aws#5161

@kamialie
Copy link
Author

kamialie commented May 4, 2024

Thanks a lot for proving so much context and related issues/pull requests! I'll double check what kubelet reports for this instance type and see if math makes sense with Karpenter added overhead, but that seems to explain it.

The core problem of memory being reported less by kubelet may have different solutions, I could also think of an optional field in ec2nodeclass, but of course I'm not aware how majority cuts these classes for their use cases and if this field would be of help. However, would definitely help my use case.

@engedaam
Copy link
Contributor

engedaam commented May 6, 2024

/remove-kind bug
/remove needs-triage
/kind support

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 6, 2024
@engedaam
Copy link
Contributor

engedaam commented May 6, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2024
@jcmcken
Copy link

jcmcken commented May 10, 2024

Also seeing this issue in 0.36.1. We're trying to use GPU nodes and setting nodeSelector to karpenter.k8s.aws/instance-category: g. If a node with that category already exists in the cluster, it works. If not, I get the same errors as OP.

I even checked my node pool requirements settings against https://karpenter.sh/docs/reference/instance-types/#g5xlarge (for example), and they should match and use this instance type. But it doesn't seem to work

@jcmcken
Copy link

jcmcken commented May 11, 2024

Also seeing this issue in 0.36.1. We're trying to use GPU nodes and setting nodeSelector to karpenter.k8s.aws/instance-category: g. If a node with that category already exists in the cluster, it works. If not, I get the same errors as OP.

I even checked my node pool requirements settings against https://karpenter.sh/docs/reference/instance-types/#g5xlarge (for example), and they should match and use this instance type. But it doesn't seem to work

Nevermind, I think the problem is AWS was out of all the instance types that our node pool was selecting. When I broadened the node pool requirements it was able to find a match finally. Also enabled the Prometheus metrics which have some useful metrics on instance type availability. Maybe the error message in this instance is a little bit too vague: no instance type which had enough resources and the required offering met the scheduling requirements

Copy link

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 9, 2024
@pecigonzalo
Copy link

pecigonzalo commented Oct 4, 2024

Sorry to resurrect this issue, but i dont think this is related to VM_MEMORY_OVERHEAD_PERCENT because as far as I understand no instance type which had enough resources and the required offering met the scheduling requirements indicates that the instance type:

  • Would have enough resources
  • Would fit the offering requirements for the nodepool
    but
  • Did not met the scheduling criteria

Is my understanding correct? Relevant code.

If my understanding is correct, it is unclear why the Pods are not getting a Node allocated and I think it would be helpful to surface in the error a more clear message about what did not match.

If it is incorrect, then I also believe the error message is inclease as this should trigger fits = false and this code path right?

@pecigonzalo
Copy link

Sample config to replicate:

Nodepool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: bubblefly
spec:
  disruption:
    budgets:
      - nodes: "1"
    consolidateAfter: 5m
    consolidationPolicy: WhenEmpty
  limits:
    cpu: "80"
  template:
    metadata:
      labels:
        nodepool/name: bubblefly
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: bottlerocket
      requirements:
        - key: karpenter.k8s.aws/instance-family
          minValues: 1
          operator: In
          values:
            - r6a
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - 2xlarge
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand

Workload:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: bubble
  name: bubble
  namespace: default
spec:
  podManagementPolicy: Parallel
  replicas: 5
  selector:
    matchLabels:
      app: bubble
  template:
    metadata:
      labels:
        app: bubble
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: nodepool/name
                    operator: In
                    values:
                      - bubblefly
      containers:
        - name: bubble
          image: registry.k8s.io/pause
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              memory: 59Gi
            requests:
              cpu: 6800m
              memory: 59Gi
      restartPolicy: Always

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/closed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants