-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EbsCsiDriverAddon: Waiter has timed out #894
Comments
@dedrone-fb Do you have worker nodes running? The reason I ask is because it unclear what kind of EC2 instance types you fed to your cluster and whether they were provisioned. You can run Another possible reason is insufficient capacity. I assume cluster autoscaler should address it (it is in your list) but it may take longer than expected to roll out a new node and hence result in the timeout. Please also share your props object: minSize, cluster version. |
The following blueprint provisioned fine: const addOns = [
new blueprints.addons.CalicoOperatorAddOn(),
new blueprints.addons.MetricsServerAddOn(),
new blueprints.addons.ClusterAutoScalerAddOn(),
new blueprints.addons.AwsLoadBalancerControllerAddOn(),
new blueprints.addons.VpcCniAddOn(),
new blueprints.addons.CoreDnsAddOn(),
new blueprints.addons.KubeProxyAddOn(),
new blueprints.addons.EbsCsiDriverAddOn()
];
const clusterProvider = new blueprints.MngClusterProvider();
const eksBlueprint = blueprints.EksBlueprint.builder()
.addOns(...addOns)
.region("us-east-1")
.version("auto")
.useDefaultSecretEncryption(true)
.clusterProvider(clusterProvider)
.name("reprod-case-ebs")
.build(app, "reprod-case-ebs"); |
I'd like to put this on hold. We currently suspect some kind of permission or quota problems. Removing any two addons seems to fix the problem (we tried with EBS CSI but without Calico and Metrics and it worked). Will report back |
I am seeing a similar issue with the following config const addOns: Array<blueprints.ClusterAddOn> = [
new blueprints.addons.SecretsStoreAddOn({
rotationPollInterval: '120s',
syncSecrets: true
}),
argoAddon,
new blueprints.addons.CalicoOperatorAddOn(),
new blueprints.addons.MetricsServerAddOn(),
new blueprints.addons.ClusterAutoScalerAddOn(),
new blueprints.addons.AwsLoadBalancerControllerAddOn(),
new blueprints.addons.VpcCniAddOn(),
new blueprints.addons.CoreDnsAddOn(),
new blueprints.addons.KubeProxyAddOn(),
new blueprints.addons.OpaGatekeeperAddOn(),
];
const stack = blueprints.EksBlueprint.builder()
.account(account)
.region(region)
.version('auto')
.addOns(...addOns)
.useDefaultSecretEncryption(true)
.enableControlPlaneLogTypes(blueprints.ControlPlaneLogType.AUDIT)
.enableGitOps(blueprints.GitOpsMode.APPLICATION)
.teams(new TeamPlatform(props.gitops.platformTeamUserRoleArn), new TeamDeveloper(props.gitops.developerTeamUserRoleArn))
.build(app, id + '-eks-bps', { env: props.env }); Is this possibly related to aws/aws-cdk#26838? Update: Also tried without GitOps enabled and seeing the same issue. Update: I can see the following error in CloudTrail around the time of the cdk deploy failure:
|
Updating as I've found the root cause for our timeout: For us at least, this appears to be caused by Lambda Concurrency Limits in a new AWS account. The underlying EKS construct spins up many Lambdas as part of the KubectlProvider implementation. As CDK does the deploy, it waits for these lambdas to apply kubectl commands in the new cluster. In our case, a new AWS account had a Concurrent Executions limit of 10 -- which is not high enough for the blueprint deploy and resulted in these Lambda requests being throttled (i.e. canceled with no error). This problem is probably exacerbated if you are installing multiple Addons. This does not appear to be an issue with |
@hshepherd thank you for this insight, it would have been very hard for us to reproduce. import "reflect-metadata";
Reflect.defineMetadata("ordered", true, addons. EbsCsiDriverAddOn); // repeat for all addons This is more of an experimental feature tbh. |
Describe the bug
We are trying to deploy an EKS Blueprint with the EBS CNI AddOn. We resproducibly run into this error message
Expected Behavior
EBS CNI Addon successfully added to cluster to be spawned
Current Behavior
Rollback initiated
Reproduction Steps
Possible Solution
No response
Additional Information/Context
Looked at and tried aws-samples/stable-diffusion-on-eks#5 - but no luck
CDK CLI Version
2.115.0 (build 58027ee)
EKS Blueprints Version
1.13.1
Node.js Version
v18.16.0
Environment details (OS name and version, etc.)
Ubuntu Linux 22.04
Other information
No response
The text was updated successfully, but these errors were encountered: