fix(logs): LogRetention resources fail with rate exceeded errors #26858

mrgrain · 2023-08-23T15:17:45Z

The LogRetention Custom Resource used to be able to handle server-side throttling, when a lot of requests to the CloudWatch Logs service are made at the same time.
Handling of this error case got lost during the migration to SDK v3.

If we have (read: a lot) LogRetention Custom Resources in a single Stack, CloudFormation apparently applies some internal breaks to the amount of parallelism. For example it appears that resources are batched in smaller groups that need to be completed before the next group is provisioned. And within the groups there appears to be a ever so slight delay between individual resources. Together this is enough to avoid rate limiting in most circumstances.

Therefore, in practice this issues only occurs when multiple stacks are deployed in parallel.

To test this scenario, I have added support for integ-runner to deploy all stacks of a test case concurrently.
Support for arbitrary command args already existed, but needed to explicitly include the concurrency option.

I then create an integration test that deploys 3 stacks à 25 LogRetention resources.
This triggers the error cases described in #26837.

The fix itself is twofold:

Pass the maxRetries prop value to the SDK client to increase the number of attempts of the SDK internal throttling handling. But also enforce a minimum for these retries since they might catch additional retryable failures that our custom outer loop does not account for.
Explicitly catch ThrottlingException errors in the outer retry loop.

Closes #26837

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

packages/aws-cdk-lib/cloud-assembly-schema/lib/integ-tests/commands/deploy.ts

aws-cdk-automation · 2023-08-24T12:43:55Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
Commit ID: e73545c
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify · 2023-08-24T13:07:35Z

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

fix(logs): LogRetention resources fail with rate exceeded errors

b95788c

aws-cdk-automation requested a review from a team August 23, 2023 15:17

github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p0 labels Aug 23, 2023

mergify bot added the contribution/core This is a PR that came from AWS. label Aug 23, 2023

rix0rrr reviewed Aug 23, 2023

View reviewed changes

packages/aws-cdk-lib/cloud-assembly-schema/lib/integ-tests/commands/deploy.ts Show resolved Hide resolved

mrgrain added 2 commits August 24, 2023 12:41

fix for the CR

3eac9eb

update cx schema to include concurrency

6e7d273

mrgrain marked this pull request as ready for review August 24, 2023 11:55

aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Aug 24, 2023

Merge branch 'main' into mrgrain/fix/log-retention-failing-retries

e73545c

rix0rrr approved these changes Aug 24, 2023

View reviewed changes

mergify bot merged commit b60e6ef into main Aug 24, 2023
9 checks passed

mergify bot deleted the mrgrain/fix/log-retention-failing-retries branch August 24, 2023 13:07

aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Aug 24, 2023

github-actions bot mentioned this pull request Sep 1, 2023

Monthly PRs metrics report - Aug 2023 #26965

Closed

mrgrain mentioned this pull request Sep 1, 2023

custom_resources: log retention rate limit error during deploy #24485

Closed

kernwig mentioned this pull request Sep 5, 2023

Support retry strategy on Lambda LogRetention #8257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(logs): LogRetention resources fail with rate exceeded errors #26858

fix(logs): LogRetention resources fail with rate exceeded errors #26858

mrgrain commented Aug 23, 2023 •

edited

Loading

aws-cdk-automation commented Aug 24, 2023

mergify bot commented Aug 24, 2023

fix(logs): LogRetention resources fail with rate exceeded errors #26858

fix(logs): LogRetention resources fail with rate exceeded errors #26858

Conversation

mrgrain commented Aug 23, 2023 • edited Loading

aws-cdk-automation commented Aug 24, 2023

AWS CodeBuild CI Report

mergify bot commented Aug 24, 2023

mrgrain commented Aug 23, 2023 •

edited

Loading