Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdk deploy: Waiter times out on clusterautoscaler #856

Open
bconner22 opened this issue Oct 10, 2023 · 6 comments
Open

cdk deploy: Waiter times out on clusterautoscaler #856

bconner22 opened this issue Oct 10, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@bconner22
Copy link

Describe the bug

Following this link.
I did this yesterday afternoon, and again this morning, the stack failed the same way

From the Cloudformation console:

2023-10-10 09:36:50 UTC-0500 eksblueprintblueprintsaddonclusterautoscalersamanifestblueprintsaddonclusterautoscalersaServiceAccountResource72D82586
CREATE_FAILED
Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"} at checkExceptions (/var/runtime/node_modules/@aws-sdk/util-waiter/dist-cjs/waiter.js:26:30) at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/waiters/waitForFunctionActiveV2.js:52:46) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async defaultInvokeFunction (/var/task/outbound.js:1:875) at async invokeUserFunction (/var/task/framework.js:1:2192) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1573)

From my cli:
Do you wish to deploy these changes (y/n)? y
eks-blueprint: deploying... [1/1]
eks-blueprint: creating CloudFormation changeset...
[█████████████████████████████████▎························] (46/80)

9:36:50 AM | CREATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | eks-blueprint/blue...e/Resource/Default
Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":
9:36:50 AM | CREATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | eksblueprintbluepr...ntResource72D82586
Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason"
:"Waiter has timed out"}
at checkExceptions (/var/runtime/node_modules/@aws-sdk/util-waiter/dist-cjs/waiter.js:26:30)
at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/waiters/waitForFunctionActi
veV2.js:52:46)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async defaultInvokeFunction (/var/task/outbound.js:1:875)
at async invokeUserFunction (/var/task/framework.js:1:2192)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 9c36b5b4-88cb-45af-b4cb-1f1056a35886)

❌ eks-blueprint failed: Error: The stack named eks-blueprint failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"}
at checkExceptions (/var/runtime/node_modules/@aws-sdk/util-waiter/dist-cjs/waiter.js:26:30)
at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/waiters/waitForFunctionActiveV2.js:52:46)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async defaultInvokeFunction (/var/task/outbound.js:1:875)
at async invokeUserFunction (/var/task/framework.js:1:2192)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 9c36b5b4-88cb-45af-b4cb-1f1056a35886)
at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:467:10232)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:470:179911)
at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:470:163159

❌ Deployment failed: Error: The stack named eks-blueprint failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"}
at checkExceptions (/var/runtime/node_modules/@aws-sdk/util-waiter/dist-cjs/waiter.js:26:30)
at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/waiters/waitForFunctionActiveV2.js:52:46)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async defaultInvokeFunction (/var/task/outbound.js:1:875)
at async invokeUserFunction (/var/task/framework.js:1:2192)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 9c36b5b4-88cb-45af-b4cb-1f1056a35886)
at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:467:10232)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:470:179911)
at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:470:163159

The stack named eks-blueprint failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"}
at checkExceptions (/var/runtime/node_modules/@aws-sdk/util-waiter/dist-cjs/waiter.js:26:30)
at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/waiters/waitForFunctionActiveV2.js:52:46)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async defaultInvokeFunction (/var/task/outbound.js:1:875)
at async invokeUserFunction (/var/task/framework.js:1:2192)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 9c36b5b4-88cb-45af-b4cb-1f1056a35886)

Expected Behavior

The cluster and addons to deploy

Current Behavior

Errors are above

Reproduction Steps

Follow https://aws-quickstart.github.io/cdk-eks-blueprints/getting-started/

Possible Solution

Does the waiter need to wait for longer?

Additional Information/Context

I'm in an AWS Orgs management account, using an IAM user, but otherwise the account is empty. The lambdas did appear to deploy correctly, and both they and the EKS cluster were in us-east-1. I did also cdk bootstrap aws://<MY_ACCOUNT_NUMBER>/us-east-1 as I saw someone ask to confirm that on a similar issue.

CDK CLI Version

2.99.1 (build b2a895e)

EKS Blueprints Version

1.12.0

Node.js Version

v20.8.0

Environment details (OS name and version, etc.)

OSX on Intel chip

Other information

No response

@bconner22 bconner22 added the bug Something isn't working label Oct 10, 2023
@AsimPoptani
Copy link

Looking at the cloud formation on the aws web interface and looking at your stack. Look for anything that has failed what reason does it say? I have not come across this exact issue but from experience this feels like a IAM permission issue for your account.

@bconner22
Copy link
Author

Hey Asim, thanks for the insight. The AWS web interface had the following in CloudFormation for the error:

2023-10-10 09:36:50 UTC-0500 eksblueprintblueprintsaddonclusterautoscalersamanifestblueprintsaddonclusterautoscalersaServiceAccountResource72D82586
CREATE_FAILED
Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"} at checkExceptions (/var/runtime/node_modules/@aws-sdk/util-waiter/dist-cjs/waiter.js:26:30) at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/waiters/waitForFunctionActiveV2.js:52:46) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async defaultInvokeFunction (/var/task/outbound.js:1:875) at async invokeUserFunction (/var/task/framework.js:1:2192) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1573)

The user I'm using from the cli is an admin user, which I believe only prevents one from seeing billing. The module of course does spin up many IAM roles that it's using, are you thinking that it might be one of those?

@AsimPoptani
Copy link

Hmm, it does not look like a perms issue then if you are using admin. The only thing that I think may help your case is to delete the stack completely and try again. This may involve deleting some resources manually. Otherwise, I am not sure what the issue could be. Sorry that I cannot be of more help.

@elamaran11
Copy link
Collaborator

@bconner22 I would recommending to do a full cleanup and run again. I would assume this to be a temporary onetime issue. Please keep us posted.

@hshepherd
Copy link

Crossposting as I believe these two issues are related:
#894 (comment)

@shapirov103
Copy link
Collaborator

@bconner22 as stated in the #894, concurrency executions service quota per account may be the issue. Another possible root cause is the default quota of 1000 is exhausted in the account because of other lambda functions deployed in the same account (this could be sporadic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants