Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to mount PV when EKS Node is Encrypted #574

Closed
geoffo-dev opened this issue Oct 24, 2021 · 8 comments
Closed

Unable to mount PV when EKS Node is Encrypted #574

geoffo-dev opened this issue Oct 24, 2021 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@geoffo-dev
Copy link

/kind bug

What happened?
I have attempted to update my node to use encrypted images, rather than the default unencrypted images EKS provides. To do this I have been using the AWS EKS Terraform module and switched to using launch templates as they show in their example.

https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/launch_templates_with_managed_node_groups

This is the launch template code I am using

resource "aws_launch_template" "default_node" {
  name_prefix            = "${var.name}-node"
  description            = "eks Node Launch-Template"
  image_id               = var.ami_id == null ? data.aws_ami.golden_image.image_id : var.ami_id
  update_default_version = true
  user_data              = data.cloudinit_config.instance.rendered

  block_device_mappings {
    device_name = var.root_volume_device_name
    ebs {
      encrypted             = true
      kms_key_id            = var.cmk
      volume_size           = var.root_volume_size
      volume_type           = var.root_volume_type
      delete_on_termination = var.root_volume_delete_on_termination

    }
  }
  monitoring {
    enabled = true
  }
  network_interfaces {
    associate_public_ip_address = false
    delete_on_termination       = true
    security_groups             = compact([ 
                                    module.eks_cluster.worker_security_group_id, 
                                  ])
  }

  lifecycle {
    create_before_destroy = true
  }
}

I have created a key specifically for the cluster (and included the autoscaling role and cluster IAM role in the permissions), however everytime I update the cluster, all of the pods that use persistent volumes cannot mount them.

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    10m                  default-scheduler  Successfully assigned ****/****-57b9fb8468-gqt6g to ip-xxxx.eu-west-2.compute.internal
  Warning  FailedMount  8m18s                kubelet            Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[aws-iam-token efs-data-volume xxxx-role-token-j828v]: timed out waiting for the condition
  Warning  FailedMount  3m42s                kubelet            Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[xxxx-role-token-j828v aws-iam-token efs-data-volume]: timed out waiting for the condition
  Warning  FailedMount  88s (x2 over 6m)     kubelet            Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[efs-data-volume xxxx-role-token-j828v aws-iam-token]: timed out waiting for the condition
  Warning  FailedMount  13s (x5 over 8m20s)  kubelet            MountVolume.SetUp failed for volume "xxxx-pv-efsdata" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

I have tried looking through the logs, but been unable to understand why this might be happening - really the only difference that I am adding is the KMS key to the root volume (as far as I can tell).

There are very limited logs of what is going on

What you expected to happen?

Volumes mount as if they would on an unencrypted EKS node

Environment

  • Kubernetes Version 1.20
  • CSI Driver Version - aws-efs-csi-controller:v1.3.4
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 24, 2021
@geoffo-dev
Copy link
Author

Apologies - this might not be a bug and might not have anything to do with the EFS Driver, but it appears to be the only thing affected by the change... It could possibly be something to do with the AWS CNI Driver...!

@sharmavijay86
Copy link

@geoffo-dev Did you get it working ? i also have same issue.

@geoffo-dev
Copy link
Author

Hey @sharmavijay86 - so sadly not... I havent had a chance to look at it over the last couple of days, but the need is becoming more urgent.

Unfortunately our policies do not allow us to use unencrypted or AWS images, so it is difficult to test this at the moment, but the only thing I can see it being is an EBS issue at this stage.

I am going to try and figure it out - if I do, I will make sure I post on here!

Good to know in a way it is not just me!!

@geoffo-dev
Copy link
Author

@sharmavijay86 so I think I got to the bottom of it and hopefully this will help anyone else... so it would appear the example in the module is (possibly) wrong.

It tells you to attach to the security group in the template.

 module.eks_cluster.worker_security_group_id

However it should be:

module.eks_cluster.cluster_primary_security_group_id

This is because the ENI interfaces attached to the EFS file stores use this security group as the source and therefore it needs to be attached to the node... this has been bugging me for weeks.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants