Workloads deployed on a node of a node_groups are unable to make calls to the internet #1089

marcosborges · 2020-11-06T01:34:27Z

Hello guys, I just uploaded an eks using the terraform-aws-eks module, at first it was a pleasant experience, the use of the module was very fluid.

I ran into a problem, configured the module to create a group of nodes so that certain types of applications could be deployed in it.

When I deployed the applications in this group of nodes via the node selector my applications are unable to make calls outside the cluster eg curl google.com.

When I remove the node selector and redo the deployment the application goes up to the standard eks job nodes. In these nodes the application can make calls outside the cluster.

bug report
feature request
[ x ] support request - read the FAQ first!
kudos, thank you, warm fuzzy

What is the current behavior?

Workloads deployed in groups of nodes are unable to make calls outside the cluster

module "eks" {
  source = "git::https://github.com/terraform-aws-modules/terraform-aws-eks.git"
  cluster_name = local.env_prefix
  cluster_version = "1.18"
  vpc_id = module.vpc.vpc_id
  subnets = module.vpc.private_subnets
  worker_groups = [
    {
      instance_type = "m4.large"
      desired_capacity = 2
      asg_max_size  = 5
    }
  ]
  node_groups = {
    vault = {
      desired_capacity = 2
      max_capacity     = 10
      min_capacity     = 2
      instance_type = "m4.large"
      k8s_labels = {
        Vault       = "true"
      }
      additional_tags = {
        Vault       = "true"
      }
    }
  }
}

I started by checking the subnets where the ec2 referring to groups of nodes were going up. It was found that they were the same as the nodes in the working group.

To raise the VPC I used the vpc module (terraform-aws-modules / terraform-aws-vpc).

After checking if it was something in the security group, the rules for the groups of nodes are the same for the worker nodes.

I also validated the IAM Role and again they were the same.

I need a light, tip, direction or smoke signal to continue creating my environment

I will be extremely grateful for the help.

The text was updated successfully, but these errors were encountered:

barryib · 2020-11-06T08:22:10Z

Humm. Can you please check if you have a NAT gateway attached to your private subnets ? Can you please share you vpc module configuration ?

You can also have a look at the examples/managed_node_groups for a working example. This will probably help you to figure out what's wrong in your deployment.

ScubaDrew · 2020-11-08T21:33:11Z

I am having the same problem. I found that the node_groups security group does not have the correct inbound rules.

barryib · 2020-11-08T21:51:45Z

Can you please elaborate. Which SG rules are missing ?

ScubaDrew · 2020-11-08T21:58:06Z

node_groups security group

worker_groups security group:

Internet is not accessible with the first SG attached, but if I attach the worker_groups SG -- things work correctly.

barryib · 2020-11-09T20:43:38Z

I think it because you have an egress rule which allow internet traffic in the worker SG. This is done for the worker SG created by this module. @marcosborges @ScubaDrew can you confirm please.

I don't use MNG at all, and when I go through the code, I don't understand why this is opened only now.

ScubaDrew · 2020-11-09T22:21:34Z

They both have the same egress rule:

barryib · 2020-11-10T02:41:57Z

I just tested an internet accès within a managed node group and everything work as expected.

I was wondering what do you mean by "unable to make calls to the internet" ? Is it an DNS issue or your DNS resolution is working correctly and you're just having trouble to reach internet ?

If you have an DNS issue, I suspect that your core-dns pod are running in your worker groups and your pod from your managed node groups can't reach them. This is because, there are no rule to allow communication between worker groups and managed node groups by default. To do that, you can set var.worker_create_cluster_primary_security_group_rules=true.

ScubaDrew · 2020-11-10T03:03:19Z

@barryib I think you are right - the issue is DNS. coredns is not running on the node_groups node.

It seems the node_groups nodes do not get permission to talk to other nodes in the cluster.

As I showed above worker_groups get:

worker_create_cluster_primary_security_group_rules does not sound like really what we need/want. We want node_groups to be able to talk to the rest of the cluster for DNS I guess... or, to have core-dns running on them? I'm not sure what is best.

barryib · 2020-11-10T03:13:26Z

Here is the description of worker_create_cluster_primary_security_group_rules:

"Whether to create security group rules to allow communication between pods on workers and pods using the primary cluster security group."

It means that it's allow communication between pod in worker groups SG and managed node groups SG. MNG use the primary SG (this was introduced in EKS 1.14).

ScubaDrew · 2020-11-10T03:15:22Z

Got it. I'll add that then ! Thank you.

The example does not have it - https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/managed_node_groups/main.tf -- so DNS wouldn't work there, right?

Thanks again

barryib · 2020-11-10T03:18:26Z

Oh good catch. Can you please test it and see if it solves your issue or even better confirm that the example you linked is not working as expected and open an PR to update the example/FAQ ?

barryib · 2020-11-10T03:24:16Z

Oh sorry. That example works. It's a quite late here ^^

That example work because, you don't have worker groups and managed node groups => so your core DNS pod run in your MNG which already share the same primary SG.

This issue comes when you have worker groups and MNG + your core DNS scheduled on one side of your cluster (in your case, on your self-managed worker groups).

ScubaDrew · 2020-11-10T04:14:01Z

Confirmed: worker_create_cluster_primary_security_group_rules fixes things when you have both worker and MNG. Thanks @barryib

barryib · 2020-11-10T08:20:04Z

Great. Can you please review #1094.

adilsonmenechini · 2020-11-10T19:09:00Z

Use the network public.

subnets = module.vpc.public_subnets

barryib · 2020-11-10T19:35:54Z

Use the network public.

subnets = module.vpc.public_subnets

Can you please elaborate ? How using public subnets will open coms between pods scheduled on managed node groups and those on self-managed worker groups.

barryib · 2020-11-11T20:33:36Z

Great. Can you please review #1094.

cc @ScubaDrew

github-actions · 2022-11-23T02:21:58Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

barryib mentioned this issue Nov 10, 2020

docs: Clarify usage of both AWS-Managed Node Groups and Self-Managed Worker Groups #1094

Merged

2 tasks

barryib closed this as completed in #1094 Nov 12, 2020

jdziat mentioned this issue Oct 2, 2021

Security Groups are incorrectly configured #1616

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workloads deployed on a node of a node_groups are unable to make calls to the internet #1089

Workloads deployed on a node of a node_groups are unable to make calls to the internet #1089

marcosborges commented Nov 6, 2020

barryib commented Nov 6, 2020

ScubaDrew commented Nov 8, 2020

barryib commented Nov 8, 2020

ScubaDrew commented Nov 8, 2020

barryib commented Nov 9, 2020

ScubaDrew commented Nov 9, 2020

barryib commented Nov 10, 2020

ScubaDrew commented Nov 10, 2020

barryib commented Nov 10, 2020

ScubaDrew commented Nov 10, 2020

barryib commented Nov 10, 2020

barryib commented Nov 10, 2020 •

edited

Loading

ScubaDrew commented Nov 10, 2020

barryib commented Nov 10, 2020

adilsonmenechini commented Nov 10, 2020

barryib commented Nov 10, 2020

barryib commented Nov 11, 2020

github-actions bot commented Nov 23, 2022

Workloads deployed on a node of a node_groups are unable to make calls to the internet #1089

Workloads deployed on a node of a node_groups are unable to make calls to the internet #1089

Comments

marcosborges commented Nov 6, 2020

What is the current behavior?

barryib commented Nov 6, 2020

ScubaDrew commented Nov 8, 2020

barryib commented Nov 8, 2020

ScubaDrew commented Nov 8, 2020

barryib commented Nov 9, 2020

ScubaDrew commented Nov 9, 2020

barryib commented Nov 10, 2020

ScubaDrew commented Nov 10, 2020

barryib commented Nov 10, 2020

ScubaDrew commented Nov 10, 2020

barryib commented Nov 10, 2020

barryib commented Nov 10, 2020 • edited Loading

ScubaDrew commented Nov 10, 2020

barryib commented Nov 10, 2020

adilsonmenechini commented Nov 10, 2020

barryib commented Nov 10, 2020

barryib commented Nov 11, 2020

github-actions bot commented Nov 23, 2022

barryib commented Nov 10, 2020 •

edited

Loading