Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment never completes but cluster is active #777

Closed
1 of 4 tasks
Tomthi opened this issue Mar 11, 2020 · 7 comments
Closed
1 of 4 tasks

Deployment never completes but cluster is active #777

Tomthi opened this issue Mar 11, 2020 · 7 comments

Comments

@Tomthi
Copy link

Tomthi commented Mar 11, 2020

I have issues

The same code was use to build two other clusters and they are working. There was some lag from when the initial clusters were built.

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

What is the current behavior?

The cluster deploy never completes with worker nodes and ASG's. The Cluster state does show active. Seems like the ASG' and worker nodes are never deployed.

If this is a bug, how to reproduce? Please include a code sample if relevant.

When running apply, the cluster is built and in an active state but the node groups are never created, there are no ASG's or launch templates.

It's stuck in a loop
module.eks-cluster.null_resource.wait_for_cluster[0]: Still creating...
module.eks-cluster.null_resource.wait_for_cluster[0]: Still creating...
module.eks-cluster.null_resource.wait_for_cluster[0]: Still creating...
module.eks-cluster.null_resource.wait_for_cluster[0]: Still creating...

Main.tf
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
version = "9.0.0"
cluster_name = "qa-eps-eks"
subnets = "${data.aws_subnet_ids.default.ids}"
vpc_id = "${var.vpc_id}"
cluster_endpoint_private_access = true
cluster_endpoint_public_access = false
cluster_security_group_id = "${data.aws_security_group.default.id}"

worker_groups = [
{
name = "qa-eks-workers"
instance_type = "m4.large"
key_name = "qa"
asg_min_size = 2
asg_desired_size = 2
asg_max_size = 5
autoscaling_enabled = true
tags = [{
propagate_at_launch = true
key = "terraform"
value = "true"
}]
}
]
tags = {
environment = "qa"
terraform = "true"
}
}

What's the expected behavior?

The expected behavior is the cluster is built along with ASG and worker nodes.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version:
  • OS:
  • Terraform version: 0.12.23

Any other relevant info

Module v9.0.0

@js-timbirkett
Copy link

js-timbirkett commented Mar 12, 2020

Do you have wget installed? The command that gets run is: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/variables.tf#L204

I'm not sure how the module would behave in terms of redirecting stderr but testing the command with a non existing command on my machine:

❯ until doesnotexist --no-check-certificate -O - -q $ENDPOINT/healthz >/dev/null; do echo "sleeping" ; sleep 4; done
zsh: command not found: doesnotexist
sleeping
zsh: command not found: doesnotexist
sleeping
zsh: command not found: doesnotexist
sleeping
zsh: command not found: doesnotexist
sleeping

it runs forever as far as I can see.

@daroga0002
Copy link
Contributor

daroga0002 commented Mar 12, 2020

If wget will be not available then it will rather fail. In this case I suggest to check wget version as there is known issue with SSL compatibility or verify does endpoint is really available from place where you running terraform code (in case of private envpoint it must be same VPC, in case of public endpoint you must ensure about whitelist value configured under

variable "cluster_endpoint_public_access_cidrs" {
description = "List of CIDR blocks which can access the Amazon EKS public API server endpoint."
type = list(string)
default = ["0.0.0.0/0"]
}
)

Please check also #757

@js-timbirkett
Copy link

I see the problem I think:

cluster_endpoint_private_access = true
cluster_endpoint_public_access = false

@Tomthi
Copy link
Author

Tomthi commented Mar 12, 2020

@js-timbirkett That seems to have fixed it. I reverted to v8.1.0 and then went back to v9.0.0 removed the one line

cluster_endpoint_public_access = false

It also didn't like, so I had to fix that.
asg_min_size = 2
asg_desired_size = 2
asg_max_size = 5

So everything looks like it's provisioned, but it does throw an error about the auth map already exists. I did clear out the authmap that it drops in the root and deleted .terraform it it still throws that error at the end.

@stale
Copy link

stale bot commented Jun 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 10, 2020
@stale
Copy link

stale bot commented Jul 10, 2020

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Jul 10, 2020
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants