Unable to mount PV when EKS Node is Encrypted #574

geoffo-dev · 2021-10-24T20:16:48Z

/kind bug

What happened?
I have attempted to update my node to use encrypted images, rather than the default unencrypted images EKS provides. To do this I have been using the AWS EKS Terraform module and switched to using launch templates as they show in their example.

https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/launch_templates_with_managed_node_groups

This is the launch template code I am using

resource "aws_launch_template" "default_node" {
  name_prefix            = "${var.name}-node"
  description            = "eks Node Launch-Template"
  image_id               = var.ami_id == null ? data.aws_ami.golden_image.image_id : var.ami_id
  update_default_version = true
  user_data              = data.cloudinit_config.instance.rendered

  block_device_mappings {
    device_name = var.root_volume_device_name
    ebs {
      encrypted             = true
      kms_key_id            = var.cmk
      volume_size           = var.root_volume_size
      volume_type           = var.root_volume_type
      delete_on_termination = var.root_volume_delete_on_termination

    }
  }
  monitoring {
    enabled = true
  }
  network_interfaces {
    associate_public_ip_address = false
    delete_on_termination       = true
    security_groups             = compact([ 
                                    module.eks_cluster.worker_security_group_id, 
                                  ])
  }

  lifecycle {
    create_before_destroy = true
  }
}

I have created a key specifically for the cluster (and included the autoscaling role and cluster IAM role in the permissions), however everytime I update the cluster, all of the pods that use persistent volumes cannot mount them.

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    10m                  default-scheduler  Successfully assigned ****/****-57b9fb8468-gqt6g to ip-xxxx.eu-west-2.compute.internal
  Warning  FailedMount  8m18s                kubelet            Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[aws-iam-token efs-data-volume xxxx-role-token-j828v]: timed out waiting for the condition
  Warning  FailedMount  3m42s                kubelet            Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[xxxx-role-token-j828v aws-iam-token efs-data-volume]: timed out waiting for the condition
  Warning  FailedMount  88s (x2 over 6m)     kubelet            Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[efs-data-volume xxxx-role-token-j828v aws-iam-token]: timed out waiting for the condition
  Warning  FailedMount  13s (x5 over 8m20s)  kubelet            MountVolume.SetUp failed for volume "xxxx-pv-efsdata" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

I have tried looking through the logs, but been unable to understand why this might be happening - really the only difference that I am adding is the KMS key to the root volume (as far as I can tell).

There are very limited logs of what is going on

What you expected to happen?

Volumes mount as if they would on an unencrypted EKS node

Environment

Kubernetes Version 1.20
CSI Driver Version - aws-efs-csi-controller:v1.3.4

geoffo-dev · 2021-10-24T20:21:50Z

Apologies - this might not be a bug and might not have anything to do with the EFS Driver, but it appears to be the only thing affected by the change... It could possibly be something to do with the AWS CNI Driver...!

sharmavijay86 · 2021-10-27T19:02:44Z

@geoffo-dev Did you get it working ? i also have same issue.

geoffo-dev · 2021-10-27T21:16:57Z

Hey @sharmavijay86 - so sadly not... I havent had a chance to look at it over the last couple of days, but the need is becoming more urgent.

Unfortunately our policies do not allow us to use unencrypted or AWS images, so it is difficult to test this at the moment, but the only thing I can see it being is an EBS issue at this stage.

I am going to try and figure it out - if I do, I will make sure I post on here!

Good to know in a way it is not just me!!

geoffo-dev · 2021-10-27T22:26:36Z

@sharmavijay86 so I think I got to the bottom of it and hopefully this will help anyone else... so it would appear the example in the module is (possibly) wrong.

It tells you to attach to the security group in the template.

 module.eks_cluster.worker_security_group_id

However it should be:

module.eks_cluster.cluster_primary_security_group_id

This is because the ENI interfaces attached to the EFS file stores use this security group as the source and therefore it needs to be attached to the node... this has been bugging me for weeks.

k8s-triage-robot · 2022-01-25T22:28:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-02-24T22:38:11Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-03-26T23:03:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-03-26T23:03:23Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 24, 2021

geoff-contino mentioned this issue Oct 27, 2021

Launch Template with Managed Node Groups Example Incorrect Security Group terraform-aws-modules/terraform-aws-eks#1662

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2022

k8s-ci-robot closed this as completed Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to mount PV when EKS Node is Encrypted #574

Unable to mount PV when EKS Node is Encrypted #574

geoffo-dev commented Oct 24, 2021

geoffo-dev commented Oct 24, 2021

sharmavijay86 commented Oct 27, 2021

geoffo-dev commented Oct 27, 2021

geoffo-dev commented Oct 27, 2021

k8s-triage-robot commented Jan 25, 2022

k8s-triage-robot commented Feb 24, 2022

k8s-triage-robot commented Mar 26, 2022

k8s-ci-robot commented Mar 26, 2022

Unable to mount PV when EKS Node is Encrypted #574

Unable to mount PV when EKS Node is Encrypted #574

Comments

geoffo-dev commented Oct 24, 2021

geoffo-dev commented Oct 24, 2021

sharmavijay86 commented Oct 27, 2021

geoffo-dev commented Oct 27, 2021

geoffo-dev commented Oct 27, 2021

k8s-triage-robot commented Jan 25, 2022

k8s-triage-robot commented Feb 24, 2022

k8s-triage-robot commented Mar 26, 2022

k8s-ci-robot commented Mar 26, 2022