Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-cdk-lib/aws-lambda-nodejs: cache esbuild results #26020

Open
2 tasks done
bestickley opened this issue Jun 16, 2023 · 22 comments
Open
2 tasks done

aws-cdk-lib/aws-lambda-nodejs: cache esbuild results #26020

bestickley opened this issue Jun 16, 2023 · 22 comments
Labels
@aws-cdk/aws-lambda-nodejs effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2

Comments

@bestickley
Copy link

Describe the feature

The NodejsFunction construct should intelligently cache build results of esbuild and reuse them on subsequent deploys.

new NodejsLambda(this, "CachedNodejsLambda", { bundling: { cache: true } });

Use Case

When working on a CDK app with many lambdas, deployments can take longer than I'd like. I want this to be faster so that the CDK provides a better DX and faster deployments. Work smarter, not harder, right? ;)

Proposed Solution

  1. Use es-module-lexer to find all files imported by entry point file.
  2. Compute hash of those files.
  3. Use hash for bundling.assetHash of construct
  4. Enjoy faster synths/deployments!

Technical considerations:

  • Ensure source code from local workspaces (like PNPM) that change are included in hash

Other Information

If this is out of scope of the AWS CDK (which I hope it is not), @NimmLorr has documented a solution using turbopack. See this comment.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

N/A

Environment details (OS name and version, etc.)

N/A

@bestickley bestickley added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jun 16, 2023
@pahud
Copy link
Contributor

pahud commented Jun 19, 2023

We had a similar discussion in 2020 #10286 and the conclusion was not to include additional npm modules and use docker instead. But welcome discussion if it's still relevant.

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Jun 19, 2023
@bestickley
Copy link
Author

bestickley commented Jun 19, 2023

@pahud, while esbuild isn't bundled into AWS CDK module, it is still used by AWS CDK NodejsFunction construct. Can something similar be done with es-module-lexer?. In order to mark cache: true, you'd need to install a peer dependency.

es-module-lexer seems pretty bare bones based from my minimal research. I'm curious, does anyone else know of a module that can give you a list of all dependencies imported by a module? I couldn't find one.

@tmokmss
Copy link
Contributor

tmokmss commented Jun 20, 2023

What if we just allow to set assetHashType as SOURCE? It would skip bundling when source files are not changed.

assetHashType: options.assetHash ? cdk.AssetHashType.CUSTOM : cdk.AssetHashType.OUTPUT,

In PythonFunction, hash type is SOURCE by default, but idk why it isn't in NodejsFunction.

@bestickley
Copy link
Author

@tmokmss, does assetHashType impact whether or not the function is bundled? I thought it was only for uploading the asset to S3? I'm asking for more of a "bundleHash"

@tmokmss
Copy link
Contributor

tmokmss commented Jun 20, 2023

@bestickley Yes if the asset hash is the same (if there is already a bundled result directory for the hash), bundling is skipped.

if (fs.existsSync(bundleDir)) { return; }

@bestickley
Copy link
Author

bestickley commented Jun 28, 2023

I'm still looking for time to dedicate to this, but wanted to document. I found this library which could do all heavy lifting of finding dependency tree: https://github.com/dependents/node-dependency-tree

EDIT: or this one too: https://www.npmjs.com/package/@vercel/nft

@fab-mindflow
Copy link

I believe this should be prioritized with #24456 to address very slow builds in large CDK projects with lambdas.

@piotrmoszkowicz
Copy link
Contributor

I would love to have that feature within CDK, it takes ages to build our Lambda dependant stacks!

@ghost
Copy link

ghost commented Sep 27, 2023

This solution doesn't necessarily deal with caching but, In the meantime a potential workaround that I ended up implementing was to create a prebuild step to bundle all the lambdas in parallel and then use Lambda.Code.fromAsset. We were able to shorten lambda bundling from 50+ seconds to<1 sec.
https://gist.github.com/mhyland-phoenicia/c16ed0907c264fc767215e6cb214e5ef

@whitakersp-fineos
Copy link

Would love to have this feature. Our builds are super slow because of lambda building after adding powertools and prisma to our lambdas.

@LeoLapworthKT
Copy link

Would love to have this feature. Our builds are super slow because of lambda building after adding powertools and prisma to our lambdas.

Wondering if https://github.com/CloudSnorkel/cdk-turbo-layers helps?

@whitakersp-fineos
Copy link

Would love to have this feature. Our builds are super slow because of lambda building after adding powertools and prisma to our lambdas.

Wondering if https://github.com/CloudSnorkel/cdk-turbo-layers helps?

That's not using NodeJsFunction which provides a lot more capability than just function

@LeoLapworthKT
Copy link

Wondering if https://github.com/CloudSnorkel/cdk-turbo-layers helps?

That's not using NodeJsFunction which provides a lot more capability than just function

We use tubro-layers to bundle 3rd party dependencies, not the lambda function it self, into a layer (which is built in Cloudformation and only builds if changes in dependencies), we ALSO extract which packages are in that and supply to NodeJsFunction:

      layers: [ thePackagerLayer ],
      bundling: {
        externalModules,
      },

We are still using NodeJsFunction, but don't have the deploy overhead (if our dependencies haven't changed) of building 3rd party packages again.

So doesn't solve caching building of your node functions, but does remove the building of the dependencies.

We also added a BUILD_STACK env, which if set only builds our core stacks + those which match this value, to minimise what CDK looks at, this has helped with deploying in development, though we added it before the watch and hotdeploy existed we still use it even with those

If anyone is interested I can see about putting the code we wrap Turbo-Layers with somewhere public

@ShivamJoker
Copy link

Bundling is too slow right now.
We need to get this merged.

@AllanOricil
Copy link

Is this going to be improved this year?

@AllanOricil
Copy link

AllanOricil commented Mar 23, 2024

After adding assetHash: cdk.AssetHashType.SOURCE property to the bundling object, in NodeJsFunction, my lambda functions are no longer rebuilt unless there is a change in the code.

So, can't this issue be closed? If not, can you explain why?

@AllanOricil
Copy link

After changing my lambda code, and rebuilding its cdk stack, no new asset was bundled. Is there an open issue for it?

@AllanOricil
Copy link

I made a mistake. assetHashType does not exist in NodeJsFunction

Copy link

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

@github-actions github-actions bot added p1 and removed p2 labels Mar 24, 2024
@AllanOricil
Copy link

AllanOricil commented Mar 27, 2024

I was able to speed up my builds using assetHash and a s3 bucket to store cdk.out. Follow the steps I did.

  1. Compute hash using esbuild for all your lambda entrypoints and store it in a file. I stored it as a json object, where the key is the lambda entrypoint path and the value is the computed hash.

  2. In your cdk project, set assetHash of your lambda to the hash found in the file you created in step 1. Use the entrypoint path to get the hash.

  3. after the first bundling, store cdk.out somewhere you can retrieve in your CI automation. I stored it in a s3 bucket, in aws. On every new build, update your cached cdk.out if it is dirty (has diffs). Clear the cache folder every now and then, or when you want to have a full build.

Just by caching cdk.out I was able to cut the time of my builds by half.

@ShivamJoker
Copy link

Can you share any code sample for this? @AllanOricil

@AllanOricil
Copy link

AllanOricil commented Mar 29, 2024

@ShivamJoker this is the script I use to compute hashes to my lambdas.

/* eslint-disable */
const fs = require("fs");
const crypto = require("crypto");
const esbuild = require("esbuild");
const path = require("path");
const pkg = JSON.parse(fs.readFileSync("package.json", "utf-8"));

// Computes the hash
async function computeHash(entryPoint) {
  const result = await esbuild.build({
    platform: "node",
    entryPoints: [entryPoint],
    write: false,
    bundle: true,
    treeShaking: true,
    minify: true,
    external: Object.keys(pkg.dependencies),
  });
  const hash = crypto.createHash("sha256");
  hash.update(Buffer.from(result.outputFiles[0].contents));
  return hash.digest("hex");
}


(async () => {
  const entryPoints = []; // Lambda entrypoint paths
  let hashes = {};
  for (let ep of entryPoints) {
    hashes[ep] = await computeHash(ep);
  }

  fs.writeFileSync(
    "./resources/lambda/computed-hashes.json",
    JSON.stringify(hashes, null, 2),
  );
})();

This is how I use the hashes located at ./resources/lambda/computed-hashes.json in the CDK when creating my lambdas:

const computedLambdaHashes = JSON.parse(fs.readSync(path.resolve("../resources/lambda/computed-hashes.json")));
const entryPath = path.resolve("../resources/lambda/lambda-path/index.ts");
const assetHash = computedLambdaHashes[entryPath]; //exchange path by the hash

new NodejsFunction(this, "function", {
  entry: entryPoint,
  handler: "main",
  runtime: lambda.Runtime.NODEJS_18_X,
  bundling: {
    assetHash // use the hash to bundle it
  }
});

During CI automation, cdk.out is downloaded from an S3 bucket before building or synth. And uploaded to S3 after synth.

- |
  if aws s3 ls s3://my-cache-bucket/cdk.out.tar.gz; then
    aws s3 cp s3://my-cache-bucket/cdk.out.tar.gz cdk.out.tar.gz
    tar -zxf cdk.out.tar.gz
  else
    echo "File not found."
  fi
- npm run build
- npm run cdk:synth:all
- tar -zcf cdk.out.tar.gz ./cdk.out
- aws s3 cp cdk.out.tar.gz s3://my-cache-bucket/cdk.out.tar.gz

@pahud pahud added p2 and removed p1 labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-lambda-nodejs effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

No branches or pull requests

9 participants