Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundling for Lambda requires unnecessary dependencies #5516

Closed
3 tasks done
perpil opened this issue Nov 22, 2023 · 18 comments
Closed
3 tasks done

Bundling for Lambda requires unnecessary dependencies #5516

perpil opened this issue Nov 22, 2023 · 18 comments
Assignees
Labels
closed-for-staleness feature-request New feature or enhancement. May require GitHub community feedback. p2 This is a standard priority issue

Comments

@perpil
Copy link

perpil commented Nov 22, 2023

Checkboxes for prior research

Describe the bug

For Lambda coldstarts, it is well known creating a minified, tree-shaken bundle measurably improves performance vs. using the V3 JavaScript SDK version bundled with Lambda.

coldstart time (ms) bundled/minified
675 no
241 yes

In a minified bundle, if you use any AWS client, credential-provider-node starts pulling in client-sso and other packages. These packages are unused but add > 32KB to the minified bundle. If you set the credentials parameter for the AWS client, it still pulls in client-sso and can't be tree-shaken out. Since loading credentials from environment variables is what most developers need for Lambda, there should be a way to only bundle the credential-provider-env credentials provider. Arguably this should be the default in the Lambda environment.

SDK version number

@aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.9.0

Reproduction Steps

  1. Create a lambda with any AWS client, i.e. STS:
import { GetCallerIdentityCommand, STSClient } from '@aws-sdk/client-sts';

const stsClient = new STSClient({region: process.env.AWS_REGION});

export async function handler(event, context) {
  const result = await stsClient.send(new GetCallerIdentityCommand({}));
  return {
    statusCode: 200,
    body: JSON.stringify({
      result
    }),
    headers: {
      'Content-Type': 'application/json',
    },
  };
}
  1. Minify, treeshake and create metafile with esbuild. If you use NodeJsFunction in the CDK your bundling looks like:
  bundling: {
        metafile: true, //use https://esbuild.github.io/analyze/ to analyze the bundle
        minify: true,
        treeshake: true,
        target: 'esnext',
        format: 'esm',
        platform: 'node',
        mainFields: ['module', 'main'],
        outputFileExtension: '.mjs',
        externalModules: [],
        banner:
          "const require = (await import('node:module')).createRequire(import.meta.url);const __filename = (await import('node:url')).fileURLToPath(import.meta.url);const __dirname = (await import('node:path')).dirname(__filename);",
  }
  1. Review the metafile using the esbuild analyzer and note it includes packages like client-sso
    I've attached the metafile for the above so you can directly review it: index.meta.json
    Look at it in Flamechart mode and note the inclusion of these packages: client-sso, token-providers, credential-provider-sso
    Screenshot of why token-providers is not tree-shaken: image

Observed Behavior

The packages client-sso, token-providers and credential-provider-sso get bundled unnecessarily and can't be tree-shaken out.

Expected Behavior

Either it uses credential-provider-env, or client config has some way of allowing me to omit credential-provider-node so the unnecessary packages can be tree-shaken out.

Possible Solution

No response

Additional Information/Context

No response

@perpil perpil added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 22, 2023
@perpil
Copy link
Author

perpil commented Nov 24, 2023

When I replaced the compiled version of this file with the contents below in my minified bundle, the sso code treeshook out and I saw a 41KB reduction in bundle size and a 39 ms reduction in cold start time:

import { fromEnv } from '@aws-sdk/credential-provider-env';
import {
  chain,
  CredentialsProviderError,
  memoize,
} from '@smithy/property-provider';
export const defaultProvider = (init = {}) =>
  memoize(
    chain(...[fromEnv()], async () => {
      throw new CredentialsProviderError(
        'Could not load credentials from any providers',
        false
      );
    }),
    (credentials) =>
      credentials.expiration !== undefined &&
      credentials.expiration.getTime() - Date.now() < 300000,
    (credentials) => credentials.expiration !== undefined
  );

The impact to coldstart is even greater if I use the sdk which is included on disk by the Lambda runtime. I have done a test where I minified everything but used what was on disk for client-sso, token-providers and credential-provider-sso and saw a 270-335 ms impact.

You can see how I benchmarked and repeat it yourself using this repo.

@RanVaknin
Copy link
Contributor

Hi @perpil ,

Thanks for the reproduction. I can confirm the observed behavior. I'll discuss it with the team and get back to you.

All the best,
Ran~

@RanVaknin RanVaknin self-assigned this Nov 28, 2023
@kuhe kuhe added feature-request New feature or enhancement. May require GitHub community feedback. and removed bug This issue is a bug. labels Nov 28, 2023
@RanVaknin RanVaknin added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 29, 2023
@RanVaknin
Copy link
Contributor

Hi @perpil ,

After discussing this with the team I got some more context about this.

We provide the entire credential provider chain so that the SDK code deployed can be platform agnostic, whether it runs on EC2, Lambda, EKS etc, it should just work. Since the bundling process happens prior to the deployment part, the bundler cannot know which platform its bundling for and so omitting the unnecessary providers is not possible.

You can probably use something like Yarn Patch, but I cannot officially recommend it.

Regarding this part

The impact to coldstart is even greater if I use the sdk which is included on disk by the Lambda runtime. I have done a test where I minified everything but used what was on disk for client-sso, token-providers and credential-provider-sso and saw a 270-335 ms impact.

Using the SDK provided by lambda is going to be less optimized because it fits an even greater array of use cases. Cold start times are in fact lower when bundling and minifying as shown in our own benchmarking.

Currently there is no way / plan to fix this.
Thank you for taking the time to raise the feature request, and I hope that this does not discourage you from engaging with us in the future.

All the best,
Ran~

@RanVaknin RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2023
@perpil
Copy link
Author

perpil commented Dec 5, 2023

Hi @RanVaknin

I understand the general purpose nature of the provider and the fact that bundling is not aware of the target platform. I am also aware that I can patch the provider myself and am successfully doing so using patch-package. The key point is that this is slowing down coldstarts for a large cross section of users (everyone using the V3 SDK and Lambda). The Lambda team would need to take incredible engineering effort to reduce coldstarts by 39 ms whereas making a change to the SDK comes at a much lower cost. There are multiple ways of solving this without breaking support for other platforms and maintaining backwards compatibility.

  1. Lazy loading. Since fromEnv is one of the first in the chain, there is no need to load other providers if there is a hit. https://aaronstuyvenberg.com/posts/lambda-lazy-loading
  2. Lazy loading based on environment variables. Either setting an environment variable for different behavior or autodetecting based on the presence of lambda specific environment variables.
  3. Allowing the developer to specify a custom provider as config that completely overrides the provider being used.

Any of these options would be better than what we have. I encourage you to reach out to former Lambda principal David Yanacek or current principal Nik Pinski before making a final decision. I spoke to both of them at re:invent and they understand the positives that reducing the cold start for everyone would have. It may seem small for the SDK team, but would be a big win for the Lambda team.

Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 20, 2023
@RanVaknin RanVaknin reopened this Jan 25, 2024
@RanVaknin
Copy link
Contributor

RanVaknin commented Jan 25, 2024

Hi @perpil,

Thank you for your patience.

@kuhe is working on implementing lazy loading solution and we hope that the changes will get rolled into the next SDK version provided by lambda but we don't have any specific timeline at the moment.

Thanks again for your contribution.
All the best,
Ran~

@RanVaknin RanVaknin added the queued This issues is on the AWS team's backlog label Jan 25, 2024
@kuhe
Copy link
Contributor

kuhe commented Jan 26, 2024

@perpil in https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.499.0, we released dynamic imports (lazy loading) for the node credential chain providers.

SSO and other credential providers including their dependency clients will not be loaded if you resolve to ENV credentials or otherwise pass credentials directly to a client. If you use Webpack for Node.js apps with the SDK, these dynamic imports will be split into other chunks and not loaded unless necessary.

At this time it is not available on AWS Lambda unless you bundle in or upload the node_modules/@aws-sdk and node_modules/@smithy parts of your application.

@kuhe kuhe self-assigned this Jan 26, 2024
@aws aws unlocked this conversation Jan 26, 2024
@perpil
Copy link
Author

perpil commented Jan 27, 2024

Thanks for the update. I'm using esbuild and bundling. I modified my repro to use 3.499 so you can see the behavior yourself. I'm expecting the delta between me hand patching the defaultProvider to only include fromEnv and the lazy loaded chain in 3.499 to be closer in time, but I'm still seeing ~27ms difference. When I omit the SSO packages from the bundle so it uses the version on disk for lambda, I see an even higher latency of 312ms. This leads me to believe that for some reason, the SSO stuff is still getting loaded. I'll keep digging to see if I can figure out exactly what is going on.

SSO and other credential providers including their dependency clients will not be loaded if you resolve to ENV credentials or otherwise pass credentials directly to a client.

I don't think this is currently true. As an experiment I tried setting both credentials and defaultCredentialProvider to the fromEnv() provider when I instantiate the STSClient. It seems it loads the defaultProvider regardless due to this code: https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sts/src/runtimeConfig.ts#L7 Due to whatever is going on above, I think it is still loading the SSO client.

At this time it is not available on AWS Lambda unless you bundle in or upload the node_modules/@AWS-SDK and node_modules/@smithy parts of your application.

Yep understood I'm bundling it.

@perpil
Copy link
Author

perpil commented Jan 27, 2024

I've tracked down what is still loading the SSO client. It's @aws-sdk/credential-provider-ini. If you comment out lines 7 and 53-55 in this file the SSO client is no longer loaded and latency is normal. It's not clear to me why the SSO bits are loaded (but unused) even when credentials-provider-ini is lazy loaded in the default provider.

The way to see this in my repro is:

  1. Make the SSO packages external
  2. Run a cold start

When you look in the logs, you'll see an Init Duration of >400ms.

To remove the SSO impact:

  1. Open node_modules/@aws-sdk/credential-provider-ini/dist-es/resolveProfileData.js, comment out lines 4 and 24-26 and save it.
  2. cdk synth;cdk deploy
  3. hit the url

You'll see an Init Duration closer to ~200ms.

Rule of thumb, if you see an Init Duration >400 ms it's loading SSO, if it's closer to 200 it isn't. By making the SSO packages external, they are loaded from disk so the latency hit is very apparent in the Init Duration logs.

@kuhe
Copy link
Contributor

kuhe commented Jan 29, 2024

We can check what is loaded by logging the require.cache.

We have additional updates pending to defer both the providers and the underlying clients used by providers, such as SSO, STS, SSO OIDC.

@kuhe
Copy link
Contributor

kuhe commented Jan 29, 2024

#5681 contains a second phase of implementing deferred loading.

@perpil
Copy link
Author

perpil commented Jan 30, 2024

Just tested with 3.502. Init Duration is on par with the patched fromEnv defaultProvider if I omit the SSO packages from the bundle. I've updated my repo with a patch for 3.502 and the new timings/bundled file sizes. I was surprised that the bundled filesizes vary so much between releases, I was expecting them to be similar in size so I might look at what changed between 3.499 and 3.502 deeper.

It's still wonky that I need to mark packages as external to get the same performance, but definitely better and should greatly improve coldstart performance if someone uses the on disk version once Lambda updates their runtime to 3.502.

It would still be nice if you could override the credentials provider in client config and all of the packages in the defaultProvider would treeshake out.
i.e. if you did something like this:

import { fromEnv } from '@aws-sdk/credential-provider-env';
const stsClient = new STSClient({
  region: process.env.AWS_REGION,
  credentials: fromEnv(),
});

The way things are currently architected the defaultProvider always pulls everything in. Is there a way to specify a different runtime config?

@kuhe
Copy link
Contributor

kuhe commented Jan 30, 2024

need to mark packages as external

This shouldn't be necessary. With the introduction of async imports, bundlers should generate the unused code into separate chunks that are never loaded. If you don't mark e.g. @aws-sdk/credential-provider-sso as external, it appears in your root chunk?

@perpil
Copy link
Author

perpil commented Jan 30, 2024

I'm using esbuild, you can see my config here Let me turn on splitting and see what happens.

@kuhe
Copy link
Contributor

kuhe commented Jan 30, 2024

I see, it's not supported at the same level as in webpack. In that case marking external for esbuild is a good way to induce the splitting.

@perpil
Copy link
Author

perpil commented Jan 30, 2024

I tried using splitting, but it appears incompatible with the CDK esbuild wrapper which is always setting outfile.

Adding:

esbuildArgs: {
          '--splitting': true,
}

results in:

✘ [ERROR] Must use "outdir" when code splitting is enabled

And adding:

esbuildArgs: {
          '--splitting': true,
          '--outdir': 'node_modules'
}

results in:

✘ [ERROR] Cannot use both "outfile" and "outdir"

@perpil
Copy link
Author

perpil commented Mar 19, 2024

I think this can be resolved, the splitting thing is a cdk peculiarity.

@kuhe kuhe added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed queued This issues is on the AWS team's backlog labels Mar 19, 2024
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Mar 24, 2024
Copy link

github-actions bot commented Apr 9, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
closed-for-staleness feature-request New feature or enhancement. May require GitHub community feedback. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants