Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed Node Groups with Launch Template using Spot Instances: Taints #1214

Closed
1 of 4 tasks
cabrinha opened this issue Feb 1, 2021 · 8 comments
Closed
1 of 4 tasks

Comments

@cabrinha
Copy link
Contributor

cabrinha commented Feb 1, 2021

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

What is the current behavior?

Creating a Managed Node Group does allow using Spot Instances and setting additional k8s labels on the node, but doesn't seem to allow specifying taints using kubelet_extra_args.

If this is a bug, how to reproduce? Please include a code sample if relevant.

working from the example: https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/launch_templates_with_managed_node_groups

Adding kubelet_extra_args to the template_file

data "template_file" "nginx" {
  template = file("${path.module}/templates/userdata.sh.tpl")

  vars = {
    cluster_name        = local.cluster_name
    endpoint            = module.eks.cluster_endpoint
    cluster_auth_base64 = module.eks.cluster_certificate_authority_data

    bootstrap_extra_args = ""
    kubelet_extra_args   = "--node-labels=node.kubernetes.io/lifecycle=spot,group=nginx,role=ingress-controllers --register-with-taints=dedicated=ingress-controllers:NoSchedule"
  }
}

resource "aws_launch_template" "nginx" {
  name_prefix             = "eks-nginx-"
  description            = "NGINX Launch-Template"
  update_default_version = true

  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size           = 10
      volume_type           = "gp2"
      delete_on_termination = true
    }
  }

  key_name = "cabrinha-mng-test"

  monitoring {
    enabled = true
  }

  network_interfaces {
    associate_public_ip_address = false
    delete_on_termination       = true
    security_groups             = [
      module.eks.cluster_primary_security_group_id,
      module.eks.cluster_security_group_id,
      module.eks.worker_security_group_id,
    ]
  }

  user_data = base64encode(
    data.template_file.nginx.rendered,
  )

  # Supplying custom tags to EKS instances is another use-case for LaunchTemplates
  tag_specifications {
    resource_type = "instance"

    tags = {
      Name = "${local.cluster_name}-nginx"
    }
  }

  # Supplying custom tags to EKS instances root volumes is another use-case for LaunchTemplates. (doesnt add tags to dynamically provisioned volumes via PVC tho)
  tag_specifications {
    resource_type = "volume"

    tags = {
      Name = "${local.cluster_name}-nginx"
    }
  }

  # Tag the LT itself
  tags = {
    Name = "${local.cluster_name}-nginx"
  }

  lifecycle {
    create_before_destroy = true
  }
}

module "eks" {
 ...

  node_groups = {
    nginx = {
      capacity_type    = "SPOT"
      desired_capacity = 3
      max_capacity     = 3
      min_capacity     = 3

      launch_template_id      = aws_launch_template.nginx.id
      launch_template_version = aws_launch_template.nginx.default_version

      instance_types = [
        "c3.2xlarge",
        "c4.xlarge",
        "c4.2xlarge",
        "c5.2xlarge",
        "c5.xlarge",
        "c5.4xlarge",
        "m3.xlarge",
        "m3.2xlarge",
        "m4.2xlarge",
        "m5.4xlarge",
        "m5a.xlarge",
        "m5d.4xlarge",
        "r3.large",
        "r4.2xlarge",
        "r5.xlarge",
        "r5.2xlarge",
        "r5.4xlarge",
        "t3a.xlarge",
        "t3a.2xlarge",
        "t3.2xlarge",
      ]
    }
  }
}

This results in NodeCreationFailure with Instances failed to join the kubernetes cluster in the AWS console.

What's the expected behavior?

Is there any way to using managed node groups, with spot instances, with taints?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version:
  • OS: macOS 10.15.7
  • Terraform version: 0.12.30

Any other relevant info

#1211

@ArchiFleKs
Copy link
Contributor

ArchiFleKs commented Feb 3, 2021

should be possible with #1138

@cabrinha
Copy link
Contributor Author

cabrinha commented Feb 3, 2021

should be possible with #1138

Would you mind adding an example that uses managed node groups, spot instances, labels and taints all together? It'd be super helpful.

@cabrinha
Copy link
Contributor Author

cabrinha commented Apr 20, 2021

@ArchiFleKs do I need to use a launch template in order to apply taints to managed node groups instances?

It seems I can't override this parameter on a per-node-group basis: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/locals.tf#L16

Other parameters are being overridden successfully, like "root_volume_size" works fine, although I just noticed that "root_volume_size" is not even a parameter here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/locals.tf#L17

  node_groups = {
    prometheus = {
      create_launch_template = true

      desired_capacity   = 1
      max_capacity       = 2
      min_capacity       = 1
      instance_types     = ["c5.xlarge", "c4.xlarge"]
      capacity_type      = "SPOT"
      disk_type          = "gp2"
      disk_size          = 20
      kubelet_extra_args = "--register-with-taints=dedicated=prometheus:NoSchedule"
      additional_tags    = var.prometheus_mng_additional_tags
      k8s_labels         = {
        "role" = "worker"
        "group" = "prometheus"
      }
    }
  }

Seems that the pre_userdata is also not picked up on a per-node-group basis either.

@ArchiFleKs
Copy link
Contributor

@cabrinha what do you mean on a per node group basis ?

Your config seems right yes. What is happening ?

Pre user data is node working because user data is merge with official AMI. I think there is discussion about this in the PR comment.

@cabrinha
Copy link
Contributor Author

cabrinha commented Apr 20, 2021

@cabrinha what do you mean on a per node group basis ?

I need to change the taints per node group -- Prometheus group needs a different taint than Nginx.

node_groups = {
  prometheus = {
    kubelet_extra_args = "--register-with-taints=dedicated=prometheus:NoSchedule"
  }
  nginx = {
    kubelet_extra_args = "--register-with-taints=dedicated=nginx:NoSchedule"
  }
}

Your config seems right yes. What is happening ?

--node-labels is the only option rendering into --kubelet-extra-args. I don't see --register-with-taints in there.

Pre user data is node working because user data is merge with official AMI. I think there is discussion about this in the PR comment.

@ArchiFleKs can you test my configuration on your side?

@cabrinha
Copy link
Contributor Author

This explains why it's not working: #1138 (comment)

@ArchiFleKs
Copy link
Contributor

I just tested with master and the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

It is working as expected

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants