Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(@aws-cdk/aws-dynamodb): DynamoDb Replication: Cannot exceed quota for PoliciesPerRole: 10 #13671

Open
reinismu opened this issue Mar 18, 2021 · 36 comments
Labels
@aws-cdk/aws-dynamodb Related to Amazon DynamoDB bug This issue is a bug. ddb-legacy-table This issue has to do with DynamoDB's legacy Table construct. Close after migration guide is out. effort/large Large work item – several weeks of effort p2

Comments

@reinismu
Copy link

reinismu commented Mar 18, 2021

Cannot exceed quota for PoliciesPerRole: 10 (Service: AmazonIdentityManagement; Status Code: 409; Error Code: LimitExceeded; Request ID: 3bd4add0-36e4-44ad-8e98-947977c0a638; Proxy: null)

Starting from version 1.92.0 I can't deploy my global tables.

#13300 seems like this is the change that introduced this bug for me.

Reproduction Steps

What did you expect to happen?

What actually happened?

Environment

  • CDK CLI Version : 1.92.0
  • Framework Version: 1.92.0
  • Node.js Version: v12.6.0
  • OS : arch linux
  • Language (Version): TypeScript

Other


This is 🐛 Bug Report

@reinismu reinismu added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 18, 2021
@github-actions github-actions bot added the @aws-cdk/aws-dynamodb Related to Amazon DynamoDB label Mar 18, 2021
@jogold
Copy link
Contributor

jogold commented Mar 18, 2021

How many tables and replica regions do you have in your stack?

@reinismu
Copy link
Author

@jogold I have 11 tables and 9 of them are replicated in another region as well

@jogold
Copy link
Contributor

jogold commented Mar 19, 2021

@skinny85 this is indeed #13300, there's a limit of 10 managed policies per role and we now create 2 managed policies per table... any suggestion?

@skinny85
Copy link
Contributor

I just want to understand the exact numbers here...

From what I see, we create 2 Policies, yes, but we attach them to different Roles:

const onEventHandlerPolicy = new SourceTableAttachedPolicy(this, provider.onEventHandler.role!);
const isCompleteHandlerPolicy = new SourceTableAttachedPolicy(this, provider.isCompleteHandler.role!);
. So, we should be able to create 10 replicas, and only the 11th one should cause this error...

@reinismu can you clarify your answer a little bit? Because I don't exactly follow. Can you show the replicationRegions that you use in your code, and how many of them?

Thanks,
Adam

@skinny85 skinny85 added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 20, 2021
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 27, 2021
@reinismu
Copy link
Author

reinismu commented Mar 28, 2021

Sorry, missed the comment.

        const episodesTable = new Table(this, `${PREFIX}-series`, {
            partitionKey: {
                name: 'seriesId',
                type: AttributeType.STRING,
            },
            tableName: `${PREFIX}-series`,
            billingMode: BillingMode.PAY_PER_REQUEST,
            replicationRegions: replicationRegions,
            removalPolicy: REMOVAL_POLICY,
            pointInTimeRecovery,
        });

I have in total 9 tables with replicationRegions

replicationRegions comes from environment variables. Now we use eu-central-1 and ap-southeast-1

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Mar 29, 2021
@skinny85
Copy link
Contributor

Sorry, to clarify: you have 9 separate Table instances, and each is replicated to 2 regions?

@skinny85 skinny85 added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 29, 2021
@reinismu
Copy link
Author

Yes :)

@skinny85
Copy link
Contributor

Interesting! I just tried this:

class ExampleStack extends cdk.Stack {
    constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
        super(scope, id, props);

        for (let i = 1; i <= 9; i++) {
            new dynamodb.Table(this, `Table${i}`, {
                partitionKey: {
                    name: 'Id',
                    type: dynamodb.AttributeType.STRING,
                },
                replicationRegions: ['eu-central-1', 'ap-southeast-1'],
                removalPolicy: cdk.RemovalPolicy.DESTROY,
            });
        }
    }
}

const app = new cdk.App();

new ExampleStack(app, 'ExampleStack');

and it deployed successfully!

Can you try it on your end? Maybe the error is somewhere else?

Thanks,
Adam

@reinismu
Copy link
Author

Not sure what's wrong, but your example doesn't work for me.

Template format error: Unresolved resource dependencies [Table7Replicaeucentral1E0F0C4BF] in the Resources block of the template

Can you try increasing 9 to 15 or something. Just to check.

I upgraded my main infra stack to 1.95.1 and the issue is still there. In total it tries to create 20 Managed policies

@skinny85
Copy link
Contributor

Hmm, that's very concerning. That error shouldn't happen for a such a simple template.

Can you push to a GitHub repo the minimal reproduction that makes you see that error? I'm curious what is causing it.

@reinismu
Copy link
Author

I figured it out, Tho more by luck than error message. My main region is eu-central-1 so I can't have it in replicationRegions.

I'm deploying the example project and will try to break it now

@reinismu
Copy link
Author

reinismu commented Mar 30, 2021

Aha got it!

I deployed first with for (let i = 1; i <= 9; i++) { and then did another deploy with for (let i = 1; i <= 12; i++) { and it broke.

Not sure if deploying it 2x does anything, but can try

@skinny85
Copy link
Contributor

skinny85 commented Mar 30, 2021

OK, at least I was able to reproduce the Unresolved resource dependencies error that you got 😜.

skinny85 added a commit to skinny85/aws-cdk that referenced this issue Mar 31, 2021
…pendencies"

When creating the Custom Resources that implement the global tables functionality,
we add dependencies between them, as you can't create replicas of the same Table concurrently.
However, if the Stack the Table is part of is env-agnostic,
we also add a CFN Condition to the Custom Resource that checks whether the given region is the deployed-to region,
and skip creating the replica in that case (as the Table itself acts as the replica in this case).
But that Condition is not compatible with the dependency clause,
as the resource will not exist if the Condition is false.

Use a trick, and instead of using a DependsOn,
add a CFN metadata that refers to the other Custom Resource through a Ref expression,
which adds an implicit dependency,
and wrap the entire Metadata in a Fn::If,
guarded by the same Condition the other Custom Resource uses.

Noticed by a customer in aws#13671 (comment).
@skinny85
Copy link
Contributor

First part of the fix: #13889

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 31, 2021
mergify bot pushed a commit that referenced this issue Mar 31, 2021
…esource dependencies" error (#13889)

When creating the Custom Resources that implement the global tables functionality,
we add dependencies between them, as you can't create replicas of the same Table concurrently.
However, if the Stack the Table is part of is env-agnostic,
we also add a CFN Condition to the Custom Resource that checks whether the given region is the deployed-to region,
and skip creating the replica in that case (as the Table itself acts as the replica in this case).
But that Condition is not compatible with the dependency clause,
as the resource will not exist if the Condition is false.

Use a trick, and instead of using a DependsOn,
add a CFN metadata that refers to the other Custom Resource through a Ref expression,
which adds an implicit dependency,
and wrap the entire Metadata in a Fn::If,
guarded by the same Condition the other Custom Resource uses.

Noticed by a customer in #13671 (comment).

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
hollanddd pushed a commit to hollanddd/aws-cdk that referenced this issue Mar 31, 2021
…esource dependencies" error (aws#13889)

When creating the Custom Resources that implement the global tables functionality,
we add dependencies between them, as you can't create replicas of the same Table concurrently.
However, if the Stack the Table is part of is env-agnostic,
we also add a CFN Condition to the Custom Resource that checks whether the given region is the deployed-to region,
and skip creating the replica in that case (as the Table itself acts as the replica in this case).
But that Condition is not compatible with the dependency clause,
as the resource will not exist if the Condition is false.

Use a trick, and instead of using a DependsOn,
add a CFN metadata that refers to the other Custom Resource through a Ref expression,
which adds an implicit dependency,
and wrap the entire Metadata in a Fn::If,
guarded by the same Condition the other Custom Resource uses.

Noticed by a customer in aws#13671 (comment).

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@skinny85
Copy link
Contributor

Yep, I was able to reproduce the original error with 11 Tables (not sure how you're getting it with just 9, but that's besides the point). The problem is the Roles for the Custom Resource framework handlers. We create a separate Policy for each replicated Table, and attach all of them to those Roles, here:

const onEventHandlerPolicy = new SourceTableAttachedPolicy(this, provider.onEventHandler.role!);
const isCompleteHandlerPolicy = new SourceTableAttachedPolicy(this, provider.isCompleteHandler.role!);

This is a tough one... I think the only way to do this properly is to somehow count how many Custom::DynamoDBReplica resources we have in the Stack, and once that number exceeds 10, we need to create new handlers (and thus Roles) for the Lambdas the custom resource framework needs.

@reinismu for now, I would split the resources into multiple Stacks, so that no Stack has more than 10 Tables with replicas.

@jogold any ideas on how to handle this?

@skinny85
Copy link
Contributor

I wonder whether the logic of counting to 10 can be nicely encapsulated in the getOrCreate () method here:

public static getOrCreate(scope: Construct, props: ReplicaProviderProps = {}) {
const stack = Stack.of(scope);
const uid = '@aws-cdk/aws-dynamodb.ReplicaProvider';
return stack.node.tryFindChild(uid) as ReplicaProvider ?? new ReplicaProvider(stack, uid, props);
}
, so that not much of the client code has to change...

Also, I think I found another potential error with this feature: today, ReplicaProvider is a singleton in a Stack, except there is a replicationTimeout property that you can pass it when creating it. Pretty sure the replicationTimeout property will be completely ignored if set on a second table that has any global replicas in one Stack...

@jogold
Copy link
Contributor

jogold commented Apr 6, 2021

Also, I think I found another potential error with this feature: today, ReplicaProvider is a singleton in a Stack, except there is a replicationTimeout property that you can pass it when creating it. Pretty sure the replicationTimeout property will be completely ignored if set on a second table that has any global replicas in one Stack...

You are correct. What do you suggest here? Should we throw if a different value is set on a second table? Include the value in the uid?

@jogold
Copy link
Contributor

jogold commented Apr 6, 2021

I wonder whether the logic of counting to 10 can be nicely encapsulated in the getOrCreate () method here:

Not a really a fan of this counting solution but I can't think of another one...

Implementing it in getOrCreate() is just a matter of adding Math.floor(counter/10) in the uid?

@reinismu
Copy link
Author

reinismu commented Apr 6, 2021

Could ask CloudFormation team why there is this limit? Maybe it's not that hard to get rid of it.

@jogold
Copy link
Contributor

jogold commented Apr 6, 2021

Could ask CloudFormation team why there is this limit? Maybe it's not that hard to get rid of it.

It's an IAM limit: Managed policies attached to an IAM role https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html

You can request to increase it to 20.

@reinismu
Copy link
Author

reinismu commented Apr 6, 2021

Ohh I see, I guess no way around that then.

Even if I request it to 20 there would still be someone who will encounter this issue :/

@skinny85
Copy link
Contributor

skinny85 commented Apr 6, 2021

Also, I think I found another potential error with this feature: today, ReplicaProvider is a singleton in a Stack, except there is a replicationTimeout property that you can pass it when creating it. Pretty sure the replicationTimeout property will be completely ignored if set on a second table that has any global replicas in one Stack...

You are correct. What do you suggest here? Should we throw if a different value is set on a second table? Include the value in the uid?

Probably including the value of timeout in the UID is the way to go here.

@skinny85
Copy link
Contributor

skinny85 commented Apr 6, 2021

I wonder whether the logic of counting to 10 can be nicely encapsulated in the getOrCreate () method here:

Not a really a fan of this counting solution but I can't think of another one...

Implementing it in getOrCreate() is just a matter of adding Math.floor(counter/10) in the uid?

@jogold is there any way we can do a similar trick that you're doing right now, just with inline policies except managed policies? Since there is no limit on the number of inline policies. Maybe change the logical ID of the inline policy based on the properties of the Table that can cause replacement? Would that work?

@jogold
Copy link
Contributor

jogold commented Apr 7, 2021

Maybe change the logical ID of the inline policy based on the properties of the Table that can cause replacement? Would that work?

This should be possible but would restrict users from using tokens in those properties?

@skinny85
Copy link
Contributor

skinny85 commented Apr 7, 2021

Yeah, you're right, it wouldn't be as fool-proof as the current trick... I guess counting to 10 is the way to go here!

@jogold
Copy link
Contributor

jogold commented Apr 8, 2021

Probably including the value of timeout in the UID is the way to go here.

Actually we cannot do this. For users already using a timeout if we include it in the uid it will impact the logical ID of the Lambda backing the custom resource. This means that the service token of the custom resource must be updated and this is not allowed in CF.

No problem for counting to 10 because we can maintain the existing uid (and thus logical ids) up to 10: see #14054

jogold added a commit to jogold/aws-cdk that referenced this issue Apr 8, 2021
The custom resource implementation uses IAM managed policies. There's a
limit of 10 managed policies per role in IAM. Create a new provider if
we reach the limit.

Closes aws#13671
@reinismu
Copy link
Author

Seems like I'm stuck at the old version for my project :/

@RomainMuller RomainMuller removed their assignment Jun 21, 2021
hollanddd pushed a commit to hollanddd/aws-cdk that referenced this issue Aug 26, 2021
…esource dependencies" error (aws#13889)

When creating the Custom Resources that implement the global tables functionality,
we add dependencies between them, as you can't create replicas of the same Table concurrently.
However, if the Stack the Table is part of is env-agnostic,
we also add a CFN Condition to the Custom Resource that checks whether the given region is the deployed-to region,
and skip creating the replica in that case (as the Table itself acts as the replica in this case).
But that Condition is not compatible with the dependency clause,
as the resource will not exist if the Condition is false.

Use a trick, and instead of using a DependsOn,
add a CFN metadata that refers to the other Custom Resource through a Ref expression,
which adds an implicit dependency,
and wrap the entire Metadata in a Fn::If,
guarded by the same Condition the other Custom Resource uses.

Noticed by a customer in aws#13671 (comment).

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 21, 2022
@irobertson
Copy link

Just ran into this today; leaving comment to avoid issue auto-closing.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 22, 2022
@oleg-slapdash
Copy link

Try setting @aws-cdk/aws-iam:minimizePolicies feature flag https://docs.aws.amazon.com/cdk/api/v1/docs/@aws-cdk_aws-iam.PolicyDocumentProps.html#minimize

@Tengda
Copy link

Tengda commented Nov 8, 2022

I have more than 10 tables with db replication and got the same error...

@FelixRelli
Copy link

Just ran into increased 20 limit. @aws-cdk/aws-iam:minimizePolicies does not help.

@nestorFigliuolo
Copy link

nestorFigliuolo commented Feb 6, 2023

Anyone was able to fix this? We are experiencing the same problem and because the problem is currently on already created stack we are unable to split it on multiple stacks. Any help would be greatly appreciated

@tobiasviehweger
Copy link

Running into this as well... having to split my persistence into different stacks because of such an arbitrary low limit is extremely unfortunate.

@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 21, 2023

This issue was for the existing Table construct, which used custom resources to implement table replication. We no longer recommend the use of the Table construct.

Instead, the TableV2 construct has been released in 2.95.1 (#27023) which maps to the AWS::DynamoDB::GlobalTable resource, has better support for replication and does not suffer from the issue described here.


Be aware that there are additional deployment steps involved in a migration from Table to TableV2. You need to do a RETAIN deployment, a delete deployment, then change the code to use TableV2 and then use cdk import. A link to a full guide will be posted once it is available.

Here are some other resources to get you started (using CfnGlobalTable instead of TableV2) if you want to get going on the migration:

@rix0rrr rix0rrr added the ddb-legacy-table This issue has to do with DynamoDB's legacy Table construct. Close after migration guide is out. label Sep 21, 2023
@pahud pahud added p2 and removed p1 labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-dynamodb Related to Amazon DynamoDB bug This issue is a bug. ddb-legacy-table This issue has to do with DynamoDB's legacy Table construct. Close after migration guide is out. effort/large Large work item – several weeks of effort p2
Projects
None yet
Development

No branches or pull requests