Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] EKS cluster got deleted because of the name conflict in two different cdk apps #5501

Closed
tamizharasanr opened this issue Dec 20, 2019 · 0 comments · Fixed by #5540
Closed
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p1

Comments

@tamizharasanr
Copy link

An EKS cluster got deleted when I am trying to deploy a new cluster. I was trying to install a new EKS cluster in same region and different VPC where already an EKS cluster was running. Both clusters where created using two different CDK app.

Reproduction Steps

  1. Deploy a CDK app for EKS with the specified name. ex (devEKSCluster)
  2. Deploy the different named CDK app for the EKS cluster with the same name as the previous one. ex (devEKSCluster)
new EKSStack(app, "test1")
new EKSStack(app, "test2")

export class EKSStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props: cdk.StackProps) {
    super(scope, id, props);
    const cluster = new eks.Cluster(this, "devEKSCluster", {
      clusterName: current_env+eksClusterName}
    );
  }
}

Error Log

PS D:\work\cdk-apps\eks> cdk deploy
[Warning at /TestEksStack/EKSCluster] Could not auto-tag private subnets with "kubernetes.io/role/internal-elb=1", please remember to do this manually
This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening).
Please confirm you intend to make the following modifications:

IAM Statement Changes
┌───┬────────────────────────────────────────────────────────┬────────┬───────────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────┬───────────┐
│   │ Resource                                               │ Effect │ Action                                                                        │ Principal                                              │ Condition │
├───┼────────────────────────────────────────────────────────┼────────┼───────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┼───────────┤
│ + │ ${EKSCluster/ClusterRole.Arn}                          │ Allow  │ sts:AssumeRole                                                                │ Service:eks.amazonaws.com                              │           │
│ + │ ${EKSCluster/ClusterRole.Arn}                          │ Allow  │ iam:PassRole                                                                  │ AWS:${EKSCluster/Resource/ResourceHandler/ServiceRole} │           │
├───┼────────────────────────────────────────────────────────┼────────┼───────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┼───────────┤
│ + │ ${EKSCluster/Resource/ResourceHandler/ServiceRole.Arn} │ Allow  │ sts:AssumeRole                                                                │ Service:lambda.amazonaws.com                           │           │
├───┼────────────────────────────────────────────────────────┼────────┼───────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┼───────────┤
│ + │ ${TestEksStackAdminRole.Arn}                           │ Allow  │ sts:AssumeRole                                                                │ AWS:arn:${AWS::Partition}:iam::533263886719:root       │           │
├───┼────────────────────────────────────────────────────────┼────────┼───────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┼───────────┤
│ + │ ${devASG/InstanceRole.Arn}                             │ Allow  │ sts:AssumeRole                                                                │ Service:ec2.amazonaws.com                              │           │
├───┼────────────────────────────────────────────────────────┼────────┼───────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┼───────────┤
│ + │ *                                                      │ Allow  │ eks:CreateCluster                                                             │ AWS:${EKSCluster/Resource/ResourceHandler/ServiceRole} │           │
│   │                                                        │        │ eks:DeleteCluster                                                             │                                                        │           │
│   │                                                        │        │ eks:DescribeCluster                                                           │                                                        │           │
│   │                                                        │        │ eks:UpdateClusterVersion                                                      │                                                        │           │
└───┴────────────────────────────────────────────────────────┴────────┴───────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────┴───────────┘
IAM Policy Changes
┌───┬────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────┐
│   │ Resource                                           │ Managed Policy ARN                                                             │
├───┼────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤
│ + │ ${EKSCluster/ClusterRole}                          │ arn:${AWS::Partition}:iam::aws:policy/AmazonEKSClusterPolicy                   │
│ + │ ${EKSCluster/ClusterRole}                          │ arn:${AWS::Partition}:iam::aws:policy/AmazonEKSServicePolicy                   │
├───┼────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤
│ + │ ${EKSCluster/Resource/ResourceHandler/ServiceRole} │ arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole │
├───┼────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤
│ + │ ${devASG/InstanceRole}                             │ arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy                │
│ + │ ${devASG/InstanceRole}                             │ arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy                     │
│ + │ ${devASG/InstanceRole}                             │ arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly       │
└───┴────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────┘
Security Group Changes
┌───┬─────────────────────────────────────────────────┬─────┬────────────────┬─────────────────────────────────────────────────┐
│   │ Group                                           │ Dir │ Protocol       │ Peer                                            │
├───┼─────────────────────────────────────────────────┼─────┼────────────────┼─────────────────────────────────────────────────┤
│ + │ ${EKSCluster/ControlPlaneSecurityGroup.GroupId} │ In  │ TCP 443        │ ${devASG/InstanceSecurityGroup.GroupId}         │
│ + │ ${EKSCluster/ControlPlaneSecurityGroup.GroupId} │ Out │ Everything     │ Everyone (IPv4)                                 │
├───┼─────────────────────────────────────────────────┼─────┼────────────────┼─────────────────────────────────────────────────┤
│ + │ ${devASG/InstanceSecurityGroup.GroupId}         │ In  │ Everything     │ ${devASG/InstanceSecurityGroup.GroupId}         │
│ + │ ${devASG/InstanceSecurityGroup.GroupId}         │ In  │ TCP 443        │ ${EKSCluster/ControlPlaneSecurityGroup.GroupId} │
│ + │ ${devASG/InstanceSecurityGroup.GroupId}         │ In  │ TCP 1025-65535 │ ${EKSCluster/ControlPlaneSecurityGroup.GroupId} │
│ + │ ${devASG/InstanceSecurityGroup.GroupId}         │ Out │ Everything     │ Everyone (IPv4)                                 │
└───┴─────────────────────────────────────────────────┴─────┴────────────────┴─────────────────────────────────────────────────┘
(NOTE: There may be security-related changes not in this list. See https://github.com/aws/aws-cdk/issues/1299)

Do you wish to deploy these changes (y/n)? y
TestEksStack: deploying...
TestEksStack: creating CloudFormation changeset...
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA)
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::IAM::Role                        | EKSCluster/Resource/ResourceHandler/ServiceRole (EKSClusterResourceHandlerServiceRoleFD631254)
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::IAM::Role                        | EKSCluster/ClusterRole (EKSClusterClusterRoleB72F3251)
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::IAM::Role                        | TestEksStackAdminRole (TestEksStackAdminRole889EDF60)
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::CDK::Metadata                    | CDKMetadata
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::EC2::SecurityGroup               | EKSCluster/ControlPlaneSecurityGroup (EKSClusterControlPlaneSecurityGroup580AD1FE)
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::IAM::Role                        | EKSCluster/Resource/ResourceHandler/ServiceRole (EKSClusterResourceHandlerServiceRoleFD631254) Resource creation Initiated
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::IAM::Role                        | EKSCluster/ClusterRole (EKSClusterClusterRoleB72F3251) Resource creation Initiated
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::IAM::Role                        | TestEksStackAdminRole (TestEksStackAdminRole889EDF60) Resource creation Initiated
  0/21 | 3:26:34 PM | CREATE_IN_PROGRESS   | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA) Resource creation Initiated
  0/21 | 3:26:35 PM | CREATE_IN_PROGRESS   | AWS::CDK::Metadata                    | CDKMetadata Resource creation Initiated
  1/21 | 3:26:35 PM | CREATE_COMPLETE      | AWS::CDK::Metadata                    | CDKMetadata
  1/21 | 3:26:38 PM | CREATE_IN_PROGRESS   | AWS::EC2::SecurityGroup               | EKSCluster/ControlPlaneSecurityGroup (EKSClusterControlPlaneSecurityGroup580AD1FE) Resource creation Initiated
  2/21 | 3:26:39 PM | CREATE_COMPLETE      | AWS::EC2::SecurityGroup               | EKSCluster/ControlPlaneSecurityGroup (EKSClusterControlPlaneSecurityGroup580AD1FE)
  3/21 | 3:26:48 PM | CREATE_COMPLETE      | AWS::IAM::Role                        | TestEksStackAdminRole (TestEksStackAdminRole889EDF60)
  4/21 | 3:26:48 PM | CREATE_COMPLETE      | AWS::IAM::Role                        | EKSCluster/Resource/ResourceHandler/ServiceRole (EKSClusterResourceHandlerServiceRoleFD631254)
  5/21 | 3:26:48 PM | CREATE_COMPLETE      | AWS::IAM::Role                        | EKSCluster/ClusterRole (EKSClusterClusterRoleB72F3251)
  5/21 | 3:26:50 PM | CREATE_IN_PROGRESS   | AWS::IAM::Policy                      | EKSCluster/Resource/ResourceHandler/ServiceRole/DefaultPolicy (EKSClusterResourceHandlerServiceRoleDefaultPolicy4D087A98)
  5/21 | 3:26:51 PM | CREATE_IN_PROGRESS   | AWS::IAM::Policy                      | EKSCluster/Resource/ResourceHandler/ServiceRole/DefaultPolicy (EKSClusterResourceHandlerServiceRoleDefaultPolicy4D087A98) Resource creation Initiated
  6/21 | 3:26:56 PM | CREATE_COMPLETE      | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA)
  7/21 | 3:27:04 PM | CREATE_COMPLETE      | AWS::IAM::Policy                      | EKSCluster/Resource/ResourceHandler/ServiceRole/DefaultPolicy (EKSClusterResourceHandlerServiceRoleDefaultPolicy4D087A98)
  7/21 | 3:27:05 PM | CREATE_IN_PROGRESS   | AWS::Lambda::Function                 | EKSCluster/Resource/ResourceHandler (EKSClusterResourceHandler31198B21)
  7/21 | 3:27:10 PM | CREATE_IN_PROGRESS   | AWS::Lambda::Function                 | EKSCluster/Resource/ResourceHandler (EKSClusterResourceHandler31198B21) Resource creation Initiated
  8/21 | 3:27:11 PM | CREATE_COMPLETE      | AWS::Lambda::Function                 | EKSCluster/Resource/ResourceHandler (EKSClusterResourceHandler31198B21)
  8/21 | 3:27:12 PM | CREATE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | EKSCluster/Resource/Resource/Default (EKSClusterE11008B6)
  8/21 | 3:27:15 PM | CREATE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | EKSCluster/Resource/Resource/Default (EKSClusterE11008B6) Resource creation Initiated
  9/21 | 3:27:15 PM | CREATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | EKSCluster/Resource/Resource/Default (EKSClusterE11008B6) Failed to create resource. An error occurred (ResourceInUseException) when calling the CreateCluster operation: Cluster already exists with name: devEKSCluster
        new CustomResource (D:\work\cdk-apps\eks\node_modules\@aws-cdk\aws-cloudformation\lib\custom-resource.ts:163:21)
        \_ new ClusterResource (D:\work\cdk-apps\eks\node_modules\@aws-cdk\aws-eks\lib\cluster-resource.ts:69:22)
        \_ new Cluster (D:\work\cdk-apps\eks\node_modules\@aws-cdk\aws-eks\lib\cluster.ts:377:18)
        \_ new EKSStack (D:\work\cdk-apps\eks\lib\eks-stack.ts:79:21)
        \_ Object.<anonymous> (D:\work\cdk-apps\eks\bin\eks.ts:8:1)
        \_ Module._compile (internal/modules/cjs/loader.js:701:30)
        \_ Module.m._compile (D:\work\cdk-apps\eks\node_modules\ts-node\src\index.ts:530:23)
        \_ Module._extensions..js (internal/modules/cjs/loader.js:712:10)
        \_ Object.require.extensions.(anonymous function) [as .ts] (D:\work\cdk-apps\eks\node_modules\ts-node\src\index.ts:533:12)
        \_ Module.load (internal/modules/cjs/loader.js:600:32)
        \_ tryModuleLoad (internal/modules/cjs/loader.js:539:12)
        \_ Function.Module._load (internal/modules/cjs/loader.js:531:3)
        \_ Function.Module.runMain (internal/modules/cjs/loader.js:754:12)
        \_ main (D:\work\cdk-apps\eks\node_modules\ts-node\src\bin.ts:212:14)
        \_ Object.<anonymous> (D:\work\cdk-apps\eks\node_modules\ts-node\src\bin.ts:470:3)
        \_ Module._compile (internal/modules/cjs/loader.js:701:30)
        \_ Object.Module._extensions..js (internal/modules/cjs/loader.js:712:10)
        \_ Module.load (internal/modules/cjs/loader.js:600:32)
        \_ tryModuleLoad (internal/modules/cjs/loader.js:539:12)
        \_ Function.Module._load (internal/modules/cjs/loader.js:531:3)
        \_ Function.Module.runMain (internal/modules/cjs/loader.js:754:12)
        \_ findNodeScript.then.existing (C:\Users\tamizharasanr\AppData\Roaming\npm\node_modules\npm\node_modules\libnpx\index.js:268:14)
  9/21 | 3:27:15 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack            | TestEksStack The following resource(s) failed to create: [EKSClusterE11008B6]. . Rollback requested by user.
  9/21 | 3:27:38 PM | DELETE_IN_PROGRESS   | AWS::IAM::Role                        | TestEksStackAdminRole (TestEksStackAdminRole889EDF60)
  9/21 | 3:27:38 PM | DELETE_IN_PROGRESS   | AWS::CDK::Metadata                    | CDKMetadata
  9/21 | 3:27:38 PM | DELETE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | EKSCluster/Resource/Resource/Default (EKSClusterE11008B6)
 10/21 | 3:27:39 PM | DELETE_COMPLETE      | AWS::CDK::Metadata                    | CDKMetadata
 11/21 | 3:27:39 PM | DELETE_COMPLETE      | AWS::IAM::Role                        | TestEksStackAdminRole (TestEksStackAdminRole889EDF60)
11/21 Currently in progress: TestEksStack, EKSClusterE11008B6
 12/21 | 3:37:17 PM | DELETE_COMPLETE      | Custom::AWSCDK-EKS-Cluster            | EKSCluster/Resource/Resource/Default (EKSClusterE11008B6)
 12/21 | 3:37:18 PM | DELETE_IN_PROGRESS   | AWS::EC2::SecurityGroup               | EKSCluster/ControlPlaneSecurityGroup (EKSClusterControlPlaneSecurityGroup580AD1FE)
 12/21 | 3:37:18 PM | DELETE_IN_PROGRESS   | AWS::Lambda::Function                 | EKSCluster/Resource/ResourceHandler (EKSClusterResourceHandler31198B21)
 13/21 | 3:37:19 PM | DELETE_COMPLETE      | AWS::Lambda::Function                 | EKSCluster/Resource/ResourceHandler (EKSClusterResourceHandler31198B21)
 14/21 | 3:37:19 PM | DELETE_COMPLETE      | AWS::EC2::SecurityGroup               | EKSCluster/ControlPlaneSecurityGroup (EKSClusterControlPlaneSecurityGroup580AD1FE)
 14/21 | 3:37:19 PM | DELETE_IN_PROGRESS   | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA)
 14/21 | 3:37:19 PM | DELETE_IN_PROGRESS   | AWS::IAM::Policy                      | EKSCluster/Resource/ResourceHandler/ServiceRole/DefaultPolicy (EKSClusterResourceHandlerServiceRoleDefaultPolicy4D087A98)
 15/21 | 3:37:20 PM | DELETE_COMPLETE      | AWS::IAM::Policy                      | EKSCluster/Resource/ResourceHandler/ServiceRole/DefaultPolicy (EKSClusterResourceHandlerServiceRoleDefaultPolicy4D087A98)
 15/21 | 3:37:20 PM | DELETE_IN_PROGRESS   | AWS::IAM::Role                        | EKSCluster/Resource/ResourceHandler/ServiceRole (EKSClusterResourceHandlerServiceRoleFD631254)
 15/21 | 3:37:20 PM | DELETE_IN_PROGRESS   | AWS::IAM::Role                        | EKSCluster/ClusterRole (EKSClusterClusterRoleB72F3251)
 16/21 | 3:37:22 PM | DELETE_COMPLETE      | AWS::IAM::Role                        | EKSCluster/ClusterRole (EKSClusterClusterRoleB72F3251)
 17/21 | 3:37:22 PM | DELETE_COMPLETE      | AWS::IAM::Role                        | EKSCluster/Resource/ResourceHandler/ServiceRole (EKSClusterResourceHandlerServiceRoleFD631254)

 ❌  TestEksStack failed: Error: The stack named TestEksStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE
The stack named TestEksStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE

Environment

  • CLI Version : 1.18.0 (build bc924bc)
  • Framework Version: "@aws-cdk/aws-eks": "^1.15.0"
  • OS : Windows 10
  • Language : typescript

Other


This is 🐛 Bug Report

@tamizharasanr tamizharasanr added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 20, 2019
@SomayaB SomayaB added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Dec 20, 2019
@eladb eladb added the p1 label Dec 22, 2019
eladb pushed a commit that referenced this issue Dec 30, 2019
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created.

The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers.

The second issue is fixed by adding 3 retries to "kubectl apply".

**Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020.

- Fixes #4087
- Fixes #4695
- Fixes #5259
- Fixes #5501

---

BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read #5544 carefully for upgrade instructions.
@eladb eladb added the in-progress This issue is being actively worked on. label Dec 30, 2019
@mergify mergify bot closed this as completed in #5540 Dec 30, 2019
mergify bot added a commit that referenced this issue Dec 30, 2019
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created.

The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers.

The second issue is fixed by adding 3 retries to "kubectl apply".

**Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020.

- Fixes #4087
- Fixes #4695
- Fixes #5259
- Fixes #5501

---

BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read #5544 carefully for upgrade instructions.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@iliapolo iliapolo changed the title EKS cluster got deleted because of the name conflict in two different cdk apps [aws-eks] EKS cluster got deleted because of the name conflict in two different cdk apps Aug 16, 2020
@iliapolo iliapolo removed in-progress This issue is being actively worked on. needs-triage This issue or PR still needs to be triaged. labels Aug 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants