Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks: authentication mode failed to update #31032

Closed
gokendra1 opened this issue Aug 6, 2024 · 5 comments · Fixed by #31043
Closed

eks: authentication mode failed to update #31032

gokendra1 opened this issue Aug 6, 2024 · 5 comments · Fixed by #31043
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@gokendra1
Copy link

Describe the bug

CloudFormation stack (created using the AWS CDK) that manage an EKS cluster that is no longer in synch due to a failed deployment.

Steps that caused this issue:

  • Initially the stack is correctly deployed and the EKS cluster has configuration: accessConfig: {};
  • A new deploy of the stack is performed with an updated EKS cluster configuration with:accessConfig: {"authenticationMode":"API_AND_CONFIG_MAP"}, and some other resources updates;
  • After the EKS cluster property was updated, the stack deploy failed because of the other resources (not in scope of this request);
  • The stack tried to rollback, but failed because "Switching authentication modes on an existing cluster is a one-way operation" and "Once the access entry method is enabled, it cannot be disabled." (https://docs.aws.amazon.com/eks/latest/userguide/grant-k8s-access.html#set-cam);
  • continued update rollback skipping the EKS cluster resource, so the template still has EKS cluster configuration: accessConfig: {};
  • Now trying to deploy again the stack (without any other resources updates) with the new EKS cluster configuration fails with "Unsupported authentication mode update from API_AND_CONFIG_MAP to API_AND_CONFIG_MAP", because the actual resources already has the new configuration.

Additional points :

  • Checked with EKS team: You can change between API_AND_CONFIG_MAP and CONFIG_MAP. Then you can change to API from API_AND_CONFIG_MAP. These operations can't be reversed in the opposite direction. Meaning that once you convert to API, you cannot go back to CONFIG_MAP or API_AND_CONFIG_MAP. Additionally, you can't change from API_AND_CONFIG_MAP to CONFIG_MAP.
  • Checking with CDK team: Since CDK uses the custom resource , EKS cluster can't be remove and import it back to the cluster

Resources:
#https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/custom-resource-handlers/lib/aws-eks/cluster-resource-handler/cluster.ts

Expected Behavior

to handle errors gracefully in the custom resource. If a setting or configuration is already set or identical in the target resource, the custom resource should send a success signal to the CloudFormation (CFN) service.

Current Behavior

Now trying to deploy again the stack (without any other resources updates) with the new EKS cluster configuration fails with "Unsupported authentication mode update from API_AND_CONFIG_MAP to API_AND_CONFIG_MAP", because the actual resources already has the new configuration.

Reproduction Steps

Mentioned in the description

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.148.0

Framework Version

No response

Node.js Version

NA

OS

NA

Language

TypeScript, .NET

Language Version

No response

Other information

No response

@gokendra1 gokendra1 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 6, 2024
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Aug 6, 2024
@pahud
Copy link
Contributor

pahud commented Aug 6, 2024

@gokendra1

Thank you. Yes this could happen. I will discuss this with the team today.

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Aug 6, 2024
@pahud pahud self-assigned this Aug 6, 2024
@pahud
Copy link
Contributor

pahud commented Aug 6, 2024

OK I tried to reproduce this scenario using the native eks.CfnCluster L1 and see how it behaves. Looks like CFN would just not do anything when updating from API_AND_CONFIG_MAP to API_AND_CONFIG_MAP.

I guess we should implement that as well in CDK and gracefully ignore the SDK error.

@pahud pahud removed their assignment Aug 6, 2024
@pahud pahud changed the title Aws-eks/cluster-resource-handler: Having issue while updating the authentication mode eks: authentication mode failed to update Aug 6, 2024
@pahud
Copy link
Contributor

pahud commented Aug 6, 2024

We should implement similar check like this

// update-cluster-version will fail if we try to update to the same version,
// so skip in this case.
const cluster = (await this.eks.describeCluster({ name: this.clusterName })).cluster;
if (cluster?.version === newVersion) {
console.log(`cluster already at version ${cluster.version}, skipping version update`);
return;
}

Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

1 similar comment
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2024
xazhao pushed a commit to xazhao/aws-cdk that referenced this issue Sep 12, 2024
The cluster resource handler would fail when updating the authMode with exactly the same mode. This could happen as described in aws#31032

We need to check if the cluster is already at the desired authMode and gracefully ignore the update.

### Issue # (if applicable)

Closes aws#31032

### Reason for this change



### Description of changes



### Description of how you validated changes

This PR is essentially to address a very special case described in aws#31032 and not easy to have a unit test or integ test for that. Instead, I validated it using manual deployment.

step 1: initial deployment of a default eks cluster with undefined authenticationMode
step 2: update the cluster and add a s3 bucket that would fail and trigger the rollback. At this point, eks auth mode would update but can't be rolled back. This makes the resource state out of sync with CFN.
step 3: re-deploy the same stack without the s3 bucket but with the same auth mode in step 2. As the cluster has already modified its auth mode, this step should gracefully succeed.


```ts
import {
  App, Stack, StackProps,
  aws_ec2 as ec2,
  aws_s3 as s3,
} from 'aws-cdk-lib';
import * as eks from 'aws-cdk-lib/aws-eks';
import { getClusterVersionConfig } from './integ-tests-kubernetes-version';

interface EksClusterStackProps extends StackProps {
  authMode?: eks.AuthenticationMode;
  withFailedResource?: boolean;
}

class EksClusterStack extends Stack {
  constructor(scope: App, id: string, props?: EksClusterStackProps) {
    super(scope, id, {
      ...props,
      stackName: 'integ-eks-update-authmod',
    });

    const vpc = new ec2.Vpc(this, 'Vpc', { maxAzs: 2, natGateways: 1, restrictDefaultSecurityGroup: false });

    const cluster = new eks.Cluster(this, 'Cluster', {
      vpc,
      ...getClusterVersionConfig(this, eks.KubernetesVersion.V1_30),
      defaultCapacity: 0,
      authenticationMode: props?.authMode,
    });

    if (props?.withFailedResource) {
      const bucket = new s3.Bucket(this, 'Bucket', { bucketName: 'aws' });
      bucket.node.addDependency(cluster);
    }

  }
}

const app = new App();

// create a simple eks cluster for the initial deployment
// new EksClusterStack(app, 'create-stack');

// 1st attempt to update with an intentional failure
new EksClusterStack(app, 'update-stack', {
  authMode: eks.AuthenticationMode.API_AND_CONFIG_MAP,
  withFailedResource: true,
});

// // 2nd attempt to update using the same authMode
new EksClusterStack(app, 'update-stack', {
  authMode: eks.AuthenticationMode.API_AND_CONFIG_MAP,
  withFailedResource: false,
});
```

And it's validated in `us-east-1`.



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants