-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toolkit: Python application containing N+ Resources Crashes with error: Malformed request, "API" field is required
#15088
Comments
As I was writing a test case for this, I found that the most recent versions of CDK are at least resulting in a useful error in this situation:
I think it makes sense to close this issue, though a limit of 500 resources is super low given that CDK is supposed to be dealing with multiple stacks compiled into Applications and CDK constructs often create many resources on their own. |
|
It would appear that I closed this issue prematurely. The limit that I saw was actually a per-stack limit, not a limit across the application. Effectively, my test case was wrong. In my additional attempts to replicate a proper test case, I've found that this seems to require more than a significant number of resources, but those resources must contain sufficient configuration detail to increase the size of some request payload. I'm still working on getting a proper test case written, but I'm re-opening this in the meantime for visibility's sake, in case someone else has additional information about it. |
The 500 resources per stack limit is imposed by CloudFormation and not something we can influence. Sorry you're experiencing problems with that right now. You should have been seeing a warning appear when you started to approach this limit, that's unfortunately the best we can do. The only thing I can recommend is breaking your application up into Stacks or Nested Stacks to keep the resource count down. "Malformed request" is an odd error. You're running into a jsii issue here. It might or might not be related to the synth error. Will forward to the jsii team to have them triage. |
"Malformed request" doesn't really fit but my first guess would be an OOM error. @RomainMuller wdyt? |
@rix0rrr It's not a per-stack limitation that we've run into, though. We're keeping the number of resources below 200 per stack, so far. OOM is an interesting thought; I'll investigate that front. |
You can try to move past a potential OOM situation by setting the If this allows your app to run, then your problem likely is an OOM error. |
I was really hopeful that this might be the answer. Alas, setting It definitely seems like we're overflowing some kind of storage space, and I'm beginning to think it might be the full size of the code base at this point, as I've been unable to duplicate the issue by running things in a loop. It seems to be happening only when the resources are defined explicitly. To get a valid test case I feel I might have to resort to code generation of some sort. This may mean that the use of higher level constructs to bind things together could help us, so I'll do some investigation on that front. |
Were we able to determine if this was caused by the Cloudformation stack resource limitation or if it is a separate bug unrelated to said limit? |
It's definitely not a stack resource limitation. We're not even coming close to the old limits on a per-stack basis. |
I ran into the same issue, there is no information in the error that relates to resource limitation.
|
I notice the platform on which this was reported is Windows. Are everyone who are seeing this error on Windows, by any chance? |
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
I have not seen this error outside of a Windows environment. Unfortunately I'm no-longer working with the original code base which I saw this in, but I'll reach out to the current maintainers to see if anyone can duplicate it on a non-windows machine. |
I unfortunately ran into this issue yesterday and have been doing some debugging. A consistently reproducible scenario is still escaping me, but perhaps any of this information can help. Environment
Edit:
|
@kgeisink, you're able to make the error occur by disabling "aws-iam:minimizePolicies"? IE: disabling the feature flag, bigger unminimized policies === error. Enabling, minimized policies, no error? Just double checking. It does seem like this has only been reported by windows users thus far so at least that gives us a place to start with investigation. As a workaround, you can try increasing the node |
@MrArnoldPalmer, In short, yes, but I only tested it after I'd found the "breaking point". (To clarify: The breaking point is not tied to one particular resource, just (presumably) the amount of resources.) In addition to Windows we also got the same error on our CodeBuild instance running a Linux image I tried There is an AWS Support Case (9889364641) currently active in which I provided a project with a reproducible scenario. I hope that it will also be reproducible on your end, and help to provide some more insight into this issue. |
If I understand correctly, it's the # PowerShell
& (env:NODE_OPTIONS="--stack-size=10000"; cdk deploy $args) |
@bgshacklett Unfortunately Edit: Found the right way to set it, and can confirm I do not have the issue anymore locally when temporarily increasing the stack size. |
@kgeisink thanks for the confirmation. Planning to look into this when I am able to but likely will be another couple days. |
@MrArnoldPalmer Apologies for the false positive, it appears that |
@kgeisink @bgshacklett wondering if either of you have code available that we can look at that causes this error? When testing the JSII python runtime and creating large numbers of objects 10 million+, I haven't gotten this error to reproduce. Is the stack trace originating in a consistent place within your code (a specific construct etc)? |
@MrArnoldPalmer I have shared our codebase with a reproducible state via the AWS Support case that I mentioned above (9889364641). Unfortunately I am not able to share it via other means due to NDA restrictions. Would you be able to access it through there? If not I can try anonymising the code but that might take a little while given the size. I have not been able to pin point it to a specific place in the code, the stack trace is also very generic. I will add it as an attachment. While commenting/uncommenting various stacks in resources I only noticed the trend that the total number of resources did seem to matter somehow. It does appear that there is some place in our project that just seems to cause incredibly inefficient resource management, as I was also not able to reproduce it by generating a large amount of resources in loops. Though I do not have enough insight into CDK/JSII internals to know how much is benefitted off of reuse of course. I've also added some of the JSII_DEBUG output leading up to the error including the error itself. |
@MrArnoldPalmer I was wondering if the code that I shared was helpful and if you happen to have an update? If there is anything I can help with to troubleshoot on my end also please let me know. |
I no-longer have access to the original code base, unfortunately. |
@kgeisink yes! it is very helpful and I was able to reproduce the issue with that codebase on MacOS. I spent some time digging into debug logs but nothing immediate jumped out and I got pulled onto some other stuff. I will keep working on this and provide an update when I'm able to. |
I had exactly the same error # This imports an existing policy.
boundary = _iam.ManagedPolicy.from_managed_policy_arn(
scope=stack,
id="Boundary",
managed_policy_arn='arn:aws:iam::123456789012:policy/boundary',
)
# Apply the boundary to all Roles in a stack
_iam.PermissionsBoundary.of(stack).apply(boundary) I hope this information will be useful for someone else https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_iam-readme.html#permissions-boundaries https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_iam/PermissionsBoundary.html |
The code base I was working with used a custom aspect for a similar purpose. |
It would appear that the various problems in this issue have all been solved. I'm going to go ahead and close this issue. If you believe this is in error, please feel free to open a new issue. |
|
My team has run into an issue with the CDK toolkit crashing when when reach a certain number of resources within our application. We've ended up having to split the application multiple times, at this point, to deal with this limitation, which does not appear to be documented.
Any operation which causes CDK to run
app.synth()
appears to result in a crash. This may be as simple as runningcdk list
.The exact number of resources in question is uknown at this time, but I suspect the number is somewhere around 1000, split across about 15 stacks.
Reproduction Steps
shortly.cdk list
within the application directoryWhat did you expect to happen?
CDK should output a list of stacks.
What actually happened?
CDK crashes with an error which appears to originate from JSII:
There is a line in the JSII code which matches this error quite well:
https://github.com/aws/jsii/blob/main/packages/@jsii/runtime/lib/host.ts#L97
Environment
Other
Further details incoming.
Update: 2021-06-24:
I've attempted numerous ways of looping over resource definitions in an attempt to recreate the issue and I have, thus far, been unable to create a test case outside of our repository, which is, sadly, not something I can share.
Above details have been updated, as well as possible, to include recent discoveries.
This is 🐛 Bug Report
The text was updated successfully, but these errors were encountered: