Karpenter doesn't create Nodes #1225

Izvi-digibank · 2022-01-26T16:18:23Z

Installed Karpenter following the documentation. Created the following provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: provisioner-test
spec:
  requirements:
    - key: dwh
      operator: In
      values: ["yes-dwh"]
  taints:
    - key: dwh
      value: cronjobs-test
      effect: "NoSchedule"
  limits:
    resources:
      cpu: 1000
  provider:
    instanceProfile: KarpenterNodeInstanceProfile-features
    subnetSelector: 
      kubernetes.io/cluster/features: '*'
    securityGroupSelector:
      Name: sg-***
  ttlSecondsAfterEmpty: 30

The resource I am trying to match with the provisioner above is:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: dwh-cron
spec:
  jobTemplate:
    spec:
      template:
        spec:
          tolerations:
            - key: "dwh"
              value: cronjobs-test
          nodeSelector:
            dwh: "yes-dwh"

The error I get:
2022-01-26T16:07:43.841Z DEBUG controller.selection Could not schedule pod, matched 0/1 provisioners, tried provisioner/provisioner-test: invalid nodeSelector "dwh", [yes-dwh] not in [] {"commit": "5047f3c", "pod": "dwh-dev/karpenter-test-4vgnb"}

Would appreciate your assistance on this issue. Thanks in advance.

The text was updated successfully, but these errors were encountered:

ellistarn · 2022-01-26T16:44:15Z

Currently, only well known labels are supported via requirements. For custom labels, use the explicit labels syntax:

labels:
  dwh: yes-dwh

I'm surprised our validation logic allowed this. @felix-zhe-huang can you take a look?

Izvi-digibank · 2022-01-26T18:25:30Z

@ellistarn Thank you.
Looks like node it being created but never comes alive:

2022-01-26T18:06:09.131Z	INFO	controller.provisioning	Batched 1 pods in 1.00072609s	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.138Z	DEBUG	controller.provisioning	Excluding instance type t3.nano because there are not enough resources for kubelet and system overhead	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.140Z	DEBUG	controller.provisioning	Excluding instance type t3a.nano because there are not enough resources for kubelet and system overhead	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.217Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [m1.small m1.medium m3.medium t3.micro t3a.micro c1.medium t3.small t3a.small c3.large c4.large c5d.large c5a.large t3.medium c6i.large t3a.medium c5ad.large c5.large c5n.large m3.large m1.large]	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.268Z	DEBUG	controller.provisioning	Discovered security groups: [sg-08162fc9a077d5ff8 sg-0b2181de7540db421]	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.268Z	DEBUG	controller.provisioning	Ignoring security group sg-0b2181de7540db421, only one group with tag kubernetes.io/cluster/features is allowed	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.271Z	DEBUG	controller.provisioning	Discovered kubernetes version 1.21	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.338Z	DEBUG	controller.provisioning	Discovered ami ami-0adc757be1e4e11a1 for query /aws/service/eks/optimized-ami/1.21/amazon-linux-2/recommended/image_id  	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.338Z	DEBUG	controller.provisioning	Discovered caBundle, length 1025	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:09.460Z	DEBUG	controller.provisioning	Created launch template, Karpenter-features-3699898603812528085	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:11.322Z	INFO	controller.provisioning	Launched instance: i-0db1827fa220659b2, hostname: ip-172-31-4-22.eu-west-1.compute.internal, type: t3a.micro, zone: eu-west-1b, capacityType: on-demand	{"commit": "5047f3c", "provisioner": "provisioner-test"}
2022-01-26T18:06:11.349Z	INFO	controller.provisioning	Bound 1 pod(s) to node ip-172-31-4-22.eu-west-1.compute.internal	{"commit": "5047f3c", "provisioner": "provisioner-test"}

➜  kubectl get nodes
NAME                                          STATUS     ROLES    AGE     VERSION
ip-172-31-4-22.eu-west-1.compute.internal     NotReady   <none>   10m

New nodes are trying to come up every ~6min
Status of all is "unknown".

Can you please suggest?

ellistarn · 2022-01-26T18:39:28Z

Are you following the getting started guide? There are many reasons the node can't connect.

instance profile needs the right permissions
security groups need connectivity to the masters
iam role needs to be granted access.

Try logging into the node with

aws ssm start-session --target $(kubectl get node -l karpenter.sh/provisioner-name -ojson | jq -r ".items[0].spec.providerID" | cut -d \/ -f5)

and then reading the kubelet logs with

sudo journalctl -u kubelet

alekc · 2022-01-26T18:40:04Z

@Izvi-digibank check your subnet selectors. I had a similar issue, it was caused by nodes binding to public subnet instead of private.

Izvi-digibank · 2022-01-27T16:35:23Z

@ellistarn I am attaching all the relevant configurations, all followed by the documentation. @alekc subnet selector is 100% private. Would appreciate your further attention to the following details;

Instance profile has the right permissions (Policy is "AmazonSSMManagedInstanceCore"):

AmazonSSMManagedInstanceCore policy :

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeAssociation",
                "ssm:GetDeployablePatchSnapshotForInstance",
                "ssm:GetDocument",
                "ssm:DescribeDocument",
                "ssm:GetManifest",
                "ssm:GetParameter",
                "ssm:GetParameters",
                "ssm:ListAssociations",
                "ssm:ListInstanceAssociations",
                "ssm:PutInventory",
                "ssm:PutComplianceItems",
                "ssm:PutConfigurePackageResult",
                "ssm:UpdateAssociationStatus",
                "ssm:UpdateInstanceAssociationStatus",
                "ssm:UpdateInstanceInformation"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssmmessages:CreateControlChannel",
                "ssmmessages:CreateDataChannel",
                "ssmmessages:OpenControlChannel",
                "ssmmessages:OpenDataChannel"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2messages:AcknowledgeMessage",
                "ec2messages:DeleteMessage",
                "ec2messages:FailMessage",
                "ec2messages:GetEndpoint",
                "ec2messages:GetMessages",
                "ec2messages:SendReply"
            ],
            "Resource": "*"
        }
    ]
}

KarpenterNodeInstanceProfile-features Trust relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

KarpenterController IAM Role policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:RunInstances",
                "ec2:CreateTags",
                "iam:PassRole",
                "ec2:TerminateInstances",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ssm:GetParameter"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

KarpenterController Trust Relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::***:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/***"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringLike": {
          "oidc.eks.eu-west-1.amazonaws.com/id/***": "system:serviceaccount:karpenter:karpenter"
        }
      }
    }
  ]
}

Karpenter 0.5.3 helm chart values file:


serviceAccount:
  # -- Create a service account for the application controller
  create: true
  # -- Service account name
  name: karpenter
  # -- Annotations to add to the service account (like the ARN of the IRSA role)
  annotations: {eks.amazonaws.com/role-arn: arn:aws:iam::***:role/KarpenterController}
    
controller:
  # -- Additional environment variables to run with
  ## - name: AWS_REGION
  ## - value: eu-west-1
  env: []
  # -- Node selectors to schedule to nodes with labels.
  nodeSelector: {}
  # -- Tolerations to schedule to nodes with taints.
  tolerations: []
  # -- Affinity rules for scheduling
  affinity: {}
  # -- Image to use for the Karpenter controller
  image: "public.ecr.aws/karpenter/controller:v0.5.3@sha256:ddd24d756cb324cf8f91f2274621646f83d6121ed6856312ca672a5f78c57174"
  # -- Cluster name
  clusterName: "features"
  # -- Cluster endpoint
  clusterEndpoint: "https://***.gr7.eu-west-1.eks.amazonaws.com"
  resources:
    requests:
      cpu: 1
      memory: 1Gi
    limits:
      cpu: 1
      memory: 1Gi
  replicas: 1
webhook:
  # -- List of environment items to add to the webhook
  env: []
  # -- Node selectors to schedule to nodes with labels.
  nodeSelector: {}
  # -- Tolerations to schedule to nodes with taints.
  tolerations: []
  # -- Affinity rules for scheduling
  affinity: {}
  # -- Image to use for the webhook
  image: "public.ecr.aws/karpenter/webhook:v0.5.3@sha256:19a1e1f2c8ec6ece1b170584dd6251d2e00f1676503a65d1433f45f46e330ddf"
  # -- Set to true if using custom CNI on EKS
  hostNetwork: true
  port: 8443
  resources:
    limits:
      cpu: 200m
      memory: 100Mi
    requests:
      cpu: 200m
      memory: 100Mi
  replicas: 1

Also, I'd expect to see logs while trying to create the node.
@ellistarn
aws ssm start-session --target $(kubectl get node -l karpenter.sh/provisioner-name -ojson | jq -r ".items[0].spec.providerID" | cut -d \/ -f5)

I get i-0c8ddaf2ca6b7427f which is the instance.

ellistarn · 2022-01-27T17:51:26Z

Can you connect using aws ssm start-session --target i-0c8ddaf2ca6b7427f and then check the kubelet logs, mentioned above?

Izvi-digibank · 2022-01-27T18:38:18Z

➜  ~ aws ssm start-session --target i-09986a80810518f03 --profile dev

Starting session with SessionId: iz@digibank-0276ae4db2a9663f3
sh-4.2$
sh-4.2$
sh-4.2$ sudo journalctl -u kubelet
-- No entries --

@ellistarn No logs visible :(
Did you happen to view the details I added in the last comment? Are all my configurations looks okay to you?

ellistarn · 2022-01-27T18:54:46Z

Your instance profile needs the 4 policies:

      ManagedPolicyArns:
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonSSMManagedInstanceCore"

Izvi-digibank · 2022-01-27T20:14:14Z

Thanks, added those policies. In my opinion it's not clear enough in the documentation, I'd suggest en edit.

New nodes is still in unknown state, however I was able to get some Kubelet logs:

Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: I0127 20:03:42.631981    3183 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Unauthorized
Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:42.645985    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:42.747172    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:42.847984    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:42.945556    3183 eviction_manager.go:255] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:42.948822    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:42 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:42.980458    3183 kubelet.go:2214] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.049754    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.150725    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.251498    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.352811    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.453649    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.553948    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: I0127 20:03:43.632164    3183 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Unauthorized
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.655409    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.756325    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.857267    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:43 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:43.958160    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:44 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:44.059166    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:44 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:44.160706    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:44 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:44.262109    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:44 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:44.363362    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:44 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:44.464479    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"
Jan 27 20:03:44 ip-172-31-9-245.eu-west-1.compute.internal kubelet[3183]: E0127 20:03:44.565171    3183 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-245.eu-west-1.compute.internal\" not found"

@ellistarn Any advise?

alekc · 2022-01-27T20:19:28Z

@Izvi-digibank
from your logs: Failed to contact API server when waiting for CSINode publishing: Unauthorized

I do not see

set {
    name  = "aws.defaultInstanceProfile"
    value = aws_iam_instance_profile.karpenter.name
  }

(check the docs https://karpenter.sh/v0.5.6/getting-started-with-terraform/#install-karpenter-helm-chart)

Izvi-digibank · 2022-01-27T20:28:19Z

@alekc I'm using v0.5.3 https://karpenter.sh/v0.5.3/getting-started-with-terraform/
There's no request for this parameter in this version's documentation.

Edit: I upgraded to v0.5.6, added the value of aws.defaultInstanceProfile to the helm chart.
Still get same results. Nothing changed.

Are my Trust relationships looks okay? for both KarpenterController and KarpenterNodeInstanceProfile-features?

Izvi-digibank · 2022-01-27T21:30:50Z

Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:56.669754    3130 kubelet_node_status.go:429] "Adding node label from cloud provider" labelKey="failure-domain.beta.kubernetes.io/region" labelValue="eu-west-1"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:56.669767    3130 kubelet_node_status.go:431] "Adding node label from cloud provider" labelKey="topology.kubernetes.io/region" labelValue="eu-west-1"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:56.671629    3130 kubelet_node_status.go:554] "Recording event message for node" node="ip-172-31-9-51.eu-west-1.compute.internal" event="NodeHasSufficientMemory"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:56.672004    3130 kubelet_node_status.go:554] "Recording event message for node" node="ip-172-31-9-51.eu-west-1.compute.internal" event="NodeHasNoDiskPressure"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:56.672287    3130 kubelet_node_status.go:554] "Recording event message for node" node="ip-172-31-9-51.eu-west-1.compute.internal" event="NodeHasSufficientPID"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:56.672558    3130 kubelet_node_status.go:71] "Attempting to register node" node="ip-172-31-9-51.eu-west-1.compute.internal"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:56.696006    3130 kubelet_node_status.go:93] "Unable to register node with API server" err="Unauthorized" node="ip-172-31-9-51.eu-west-1.compute.internal"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:56.739358    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:56.839931    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:56.941040    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.041166    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.141854    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.242647    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.343159    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.444323    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: I0127 21:23:57.488624    3130 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Unauthorized
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.545135    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.645959    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.747037    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.848239    3130 kubelet.go:2294] "Error getting node" err="node \"ip-172-31-9-51.eu-west-1.compute.internal\" not found"
Jan 27 21:23:57 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:57.855915    3130 kubelet.go:2214] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

Got some new logs from Kubelet.

ellistarn · 2022-01-27T23:54:44Z

Jan 27 21:23:56 ip-172-31-9-51.eu-west-1.compute.internal kubelet[3130]: E0127 21:23:56.696006 3130 kubelet_node_status.go:93] "Unable to register node with API server" err="Unauthorized" node="ip-172-31-9-51.eu-west-1.compute.internal"

Your node can't communicate with the API Server.

Here's an example of my aws-auth configmap

k get configmaps -n kube-system aws-auth -oyaml
apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::767520670908:role/KarpenterNodeRole-etarn
      username: system:node:{{EC2PrivateDNSName}}
  mapUsers: |
    []
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system

In the future, I highly recommend following or directly translating one of the guides.

Izvi-digibank · 2022-01-30T08:59:01Z

Your'e correct, apparently my aws-auth has not got updated. closing this thread. Thanks.

devopsjnr · 2022-04-18T14:30:44Z

@ellistarn @felix-zhe-huang I encounter the same error, could you please take a look? #1683

kaiohenricunha · 2024-01-14T03:36:09Z

@Izvi-digibank check your subnet selectors. I had a similar issue, it was caused by nodes binding to public subnet instead of private.

That was my problem too. Removing the discovery tag from public subnets and then deleting the stuck nodeclaim and instance rapidly resolved the issue.

ellistarn assigned felix-zhe-huang Jan 26, 2022

ellistarn added bug Something isn't working burning Time sensitive issues labels Jan 27, 2022

Izvi-digibank closed this as completed Jan 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter doesn't create Nodes #1225

Karpenter doesn't create Nodes #1225

Izvi-digibank commented Jan 26, 2022

ellistarn commented Jan 26, 2022

Izvi-digibank commented Jan 26, 2022

ellistarn commented Jan 26, 2022

alekc commented Jan 26, 2022

Izvi-digibank commented Jan 27, 2022 •

edited

Loading

ellistarn commented Jan 27, 2022

Izvi-digibank commented Jan 27, 2022 •

edited

Loading

ellistarn commented Jan 27, 2022 •

edited

Loading

Izvi-digibank commented Jan 27, 2022

alekc commented Jan 27, 2022

Izvi-digibank commented Jan 27, 2022 •

edited

Loading

Izvi-digibank commented Jan 27, 2022

ellistarn commented Jan 27, 2022

Izvi-digibank commented Jan 30, 2022

devopsjnr commented Apr 18, 2022

kaiohenricunha commented Jan 14, 2024

Karpenter doesn't create Nodes #1225

Karpenter doesn't create Nodes #1225

Comments

Izvi-digibank commented Jan 26, 2022

ellistarn commented Jan 26, 2022

Izvi-digibank commented Jan 26, 2022

ellistarn commented Jan 26, 2022

alekc commented Jan 26, 2022

Izvi-digibank commented Jan 27, 2022 • edited Loading

ellistarn commented Jan 27, 2022

Izvi-digibank commented Jan 27, 2022 • edited Loading

ellistarn commented Jan 27, 2022 • edited Loading

Izvi-digibank commented Jan 27, 2022

alekc commented Jan 27, 2022

Izvi-digibank commented Jan 27, 2022 • edited Loading

Izvi-digibank commented Jan 27, 2022

ellistarn commented Jan 27, 2022

Izvi-digibank commented Jan 30, 2022

devopsjnr commented Apr 18, 2022

kaiohenricunha commented Jan 14, 2024

Izvi-digibank commented Jan 27, 2022 •

edited

Loading

Izvi-digibank commented Jan 27, 2022 •

edited

Loading

ellistarn commented Jan 27, 2022 •

edited

Loading

Izvi-digibank commented Jan 27, 2022 •

edited

Loading