Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 bucket deployment time outs with many files #7571

Closed
jeshan opened this issue Apr 23, 2020 · 8 comments
Closed

s3 bucket deployment time outs with many files #7571

jeshan opened this issue Apr 23, 2020 · 8 comments
Assignees
Labels
@aws-cdk/aws-s3 Related to Amazon S3 bug This issue is a bug.

Comments

@jeshan
Copy link

jeshan commented Apr 23, 2020

Deploying a website with BucketDeployment that has thousands of files times out as it needs to run longer than the lambda function timeout of 15 minutes.

Reproduction Steps

  1. Create a website with thousands of files
  2. Deploy as follows:
new BucketDeployment(this, "WebsiteDeployment", {
        sources: [Source.asset(placeHolderSource)],
        destinationBucket: websiteBucket,
        distribution,
        retainOnDelete: false
      });

Error Log

The serverless function

REPORT RequestId: 3e301853-5a0a-4220-a93f-eb9b03bdcfa4 Duration: 900004.88 ms Billed Duration: 900000 ms Memory Size: 128 MB Max Memory Used: 128 MB

the cloudformation stack

image

Environment

  • CLI Version : 1.34.1
  • Framework Version:
  • OS : Linux
  • Language : JavaScript

Other

Suggestions

I think it's because of the s3 sync is checking thousands of times if the files exist.

# sync from "contents" to destination
aws_command("s3", "sync", "--delete", contents_dir, s3_dest, *create_metadata_args(user_metadata, system_metadata))

  1. I'm thinking there could be an alternative or an option for cdk users to use the cp command instead, or:
  2. the team can make it use aws s3 cp --recursive when creating the deployment and continue using sync for subsequent updates. That way, at least the cdk stack will stabilise (but postpones dealing with the issue)

This is 🐛 Bug Report

@jeshan jeshan added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 23, 2020
@jeshan
Copy link
Author

jeshan commented Apr 23, 2020

I saw that we can configure memory on the deployment resource to increase the lambda performance. Is that the way to go?

@NGL321 NGL321 added @aws-cdk/aws-s3 Related to Amazon S3 and removed needs-triage This issue or PR still needs to be triaged. labels Apr 23, 2020
@armandosoriano
Copy link

I'm facing the same, it logs copying file until this line
Completed 109.5 MiB/261.8 MiB (4.7 MiB/s) with 1 file(s) remaining
then

[ERROR]	2020-05-04T00:35:03.737Z	529deb7e-e023-4b70-8d42-82a77cc0611e	Command '['python3', '/var/task/aws', 's3', 'cp', 's3://cdktoolkit-xxxxxxxxx/assets/x.zip', '/tmp/tmpwl1hisdf/dca9c054-1839-42fa-a5f1-1d702f2e1167']' died with <Signals.SIGKILL: 9>.

@NetaNir NetaNir assigned iliapolo and unassigned NetaNir May 4, 2020
@armandosoriano
Copy link

@jeshan I just tried increasing memory to 256mb and it thrown [Errno 28] No space left on device, looks like Python is struggling

@iliapolo
Copy link
Contributor

iliapolo commented May 8, 2020

@jeshan Wrote:

I think it's because of the s3 sync is checking thousands of times if the files exist.

I am not versed enough in how the CLI optimizations work, but its possible. In any case, we already have a feature request for toggling the --delete flag, which I assume is what might be causing the behavior you are describing.

Feel free to 👍 it!

@armandosoriano Wrote:

@jeshan I just tried increasing memory to 256mb and it thrown [Errno 28] No space left on device, looks like Python is struggling

Dont think this its related to python per se.
I suspect that increasing the memory just made the download work a little faster, and thus giving it enough time to hit the 500MB Limit. Unfortunately that limitation is harder to workaround.
So if your deployment is larger than 500MB you'll need to manually run aws cp or aws sync from your CI machine for now, until we come up with a solution for this.

In any case, the 15 minutes time limit is a hard lambda limit, so no matter how many optimizations or configurations we provide, if the sync/cp command takes longer than that, its going to be a problem.

I am leaning towards closing this issue as it seems to relate to already existing issues.

Thoughts?

@iliapolo iliapolo added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 8, 2020
@jeshan
Copy link
Author

jeshan commented May 8, 2020

before closing, let's share ideas that's beyond the scope of #953

Instead of using aws s3 cp, we could do what it's doing in parallel from the python code (using regular python multi-threading). I'm not sure how much this is of a good idea.

so no matter how many optimizations or configurations we provide

Yes, but at least we'll be able to make progress in the right direction by quite a bit.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 8, 2020
@armandosoriano
Copy link

@iliapolo our deployment weights less than 300MB so I'm not sure if it is really related to that limit.

Indeed I'm just guessing but it looks like python process gets stuck during the copy files operation, while it still reports 5-8MB/s speed, which is indeed faulty.

Maybe the copy is proper but for some reason the process is not ending.
Maybe the process ends but it is not reported and thus the timeout is reached.

Without more knowledge about how that is being managed is difficult to know. Hope those guessings can at least help you in your investigation.

@iliapolo
Copy link
Contributor

@armandosoriano @jeshan thanks for the feedback.

Keeping this on our docket as we discuss what the best course of action here would be.

@iliapolo
Copy link
Contributor

Closing in favor of #7950

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-s3 Related to Amazon S3 bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

5 participants