From 059223d3860485e77ecb95951354e419d913377d Mon Sep 17 00:00:00 2001 From: George Fu Date: Thu, 12 Sep 2024 13:27:12 -0400 Subject: [PATCH] docs: create new Node.js performance docs section and clean up upgrading docs (#6466) * docs: create new Node.js performance docs section and clean up upgrading docs * docs: create lambda -> Node.js performance doc cross-link --- UPGRADING.md | 31 ++- supplemental-docs/AWS_LAMBDA.md | 21 +- supplemental-docs/CLIENTS.md | 39 ++++ supplemental-docs/README.md | 5 + supplemental-docs/performance/README.md | 1 + .../performance/parallel-workloads-node-js.md | 181 ++++++++++++++++++ 6 files changed, 264 insertions(+), 14 deletions(-) create mode 100644 supplemental-docs/performance/parallel-workloads-node-js.md diff --git a/UPGRADING.md b/UPGRADING.md index 2c0c4ff9c3333..ab01956cfe316 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -54,12 +54,13 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa configure them by supplying a new `requestHandler`. Here's the example of setting http options in Node.js runtime. You can find more in [v3 reference for NodeHttpHandler](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-node-http-handler/). - All v3 requests use HTTPS by default. You only need to provide custom httpsAgent. + All v3 requests use HTTPS by default. You can provide a custom agent via the `httpsAgent` + field of the `NodeHttpHandler` constructor input. ```javascript const { Agent } = require("https"); - const { Agent: HttpAgent } = require("http"); const { NodeHttpHandler } = require("@smithy/node-http-handler"); + const dynamodbClient = new DynamoDBClient({ requestHandler: new NodeHttpHandler({ httpsAgent: new Agent({ @@ -71,19 +72,19 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa }); ``` - If you are passing custom endpoint which uses http, then you need to provide httpAgent. + If you are using a custom endpoint which uses http, then you can provide an `httpAgent`. ```javascript const { Agent } = require("http"); const { NodeHttpHandler } = require("@smithy/node-http-handler"); const dynamodbClient = new DynamoDBClient({ + endpoint: "http://example.com", requestHandler: new NodeHttpHandler({ httpAgent: new Agent({ /*params*/ }), }), - endpoint: "http://example.com", }); ``` @@ -92,6 +93,7 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa ```javascript const { FetchHttpHandler } = require("@smithy/fetch-http-handler"); + const dynamodbClient = new DynamoDBClient({ requestHandler: new FetchHttpHandler({ requestTimeout: /*number in milliseconds*/ @@ -121,14 +123,16 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa - **v3**: **Deprecated**. Requests are _always_ asynchronous. - `xhrWithCredentials` - **v2**: Sets the "withCredentials" property of an XMLHttpRequest object. - - **v3**: Not available. SDK inherits [the default fetch configurations](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch) + - **v3**: the `fetch` equivalent field `credentials` can be set via constructor + configuration to the `requestHandler` config when using the browser + default `FetchHttpHandler`. - [`logger`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#logger-property) - **v2**: An object that responds to .write() (like a stream) or .log() (like the console object) in order to log information about requests. - **v3**: No change. More granular logs are available in v3. - [`maxRedirects`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#maxRedirects-property) - **v2**: The maximum amount of redirects to follow for a service request. - - **v3**: **Deprecated**. SDK _does not_ follow redirects to avoid unintentional cross-region requests. + - **v3**: **Deprecated**. SDK _does not_ follow redirects to avoid unintentional cross-region requests. S3 region redirects can be enabled separately with `followRegionRedirects=true` in the S3 Client only. - [`maxRetries`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#maxRetries-property) - **v2**: The maximum amount of retries to perform for a service request. - **v3**: Changed to `maxAttempts`. See more in [v3 reference for RetryInputConfig](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-middleware-retry/#maxattempts). @@ -179,6 +183,19 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa - **v2**: Whether to use the Accelerate endpoint with the S3 service. - **v3**: No change. +## Error handling + +Top level fields such as `error.code` and http response metadata like the +status code have slightly moved locations within the thrown error object +to subfields like `error.$metadata` or `error.$response`. + +This is because v3 more accurately follows the service models and avoids +adding metadata at the top level of the error object, which may conflict +with the structural error shape modeled by the services. + +See how error handling has changed in v3 +here: [ERROR_HANDLING](./supplemental-docs/ERROR_HANDLING.md). + ## Credential Providers In v2, the SDK provides a list of credential providers to choose from, as well as a credentials provider chain, @@ -348,7 +365,7 @@ variable. ### File System Credentials -- **v2**: [`FileSystemCredentials`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/FileSystemCredentials.html) +- **v2**: [`FileSystemCredentials`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/FileSystemCredentials.html) represents credentials from a JSON file on disk. - **v3**: **Deprecated**. You can explicitly read the JSON file and supply to the client. Please open a [feature request](https://github.com/aws/aws-sdk-js-v3/issues/new?assignees=&labels=feature-request&template=---feature-request.md&title=) diff --git a/supplemental-docs/AWS_LAMBDA.md b/supplemental-docs/AWS_LAMBDA.md index 503c323c39229..f6bb7e38b8436 100644 --- a/supplemental-docs/AWS_LAMBDA.md +++ b/supplemental-docs/AWS_LAMBDA.md @@ -2,9 +2,9 @@ ## AWS Lambda provided AWS SDK -Several AWS Lambda runtimes, including those for Node.js, include the AWS SDK at various versions. +Several AWS Lambda runtimes, including those for Node.js, include the AWS SDK at various versions. -The SDK is provided as a convenience for development. For greater control of the SDK version and its runtime characteristics such as +The SDK is provided as a convenience for development. For greater control of the SDK version and its runtime characteristics such as JavaScript bundling, upload your selection of the AWS SDK as part of your function code. To check the version of the SDK that is installed, you can log the package.json metadata of a package that you are using. @@ -16,13 +16,13 @@ const pkgJson = require("@aws-sdk/client-s3/package.json"); exports.handler = function (event) { console.log(pkgJson); return JSON.stringify(pkgJson); -} +}; ``` ## Best practices for initializing AWS SDK Clients in AWS Lambda -Suppose that you have an `async` function called, for example `prepare`, that you need to initialize only once. -You do not want to execute it for every function invocation. +Suppose that you have an `async` function called, for example `prepare`, that you need to initialize only once. +You do not want to execute it for every function invocation. ```js // Example: one-time initialization in the handler code path. @@ -51,7 +51,7 @@ export async function handler(event) { } ``` -There is a potential complication with this style. This is a peculiarity of AWS Lambda's cold/warm states and provisioned concurrency. +There is a potential complication with this style. This is a peculiarity of AWS Lambda's cold/warm states and provisioned concurrency. If you make network requests in the `prepare()` function, they may be frozen pre-flight as part of early provisioning. In a certain edge case, time-sensitive signed requests may become invalid due to the delay between provisioning and execution. @@ -65,7 +65,8 @@ let ready = false; export async function handler(event) { if (!ready) { - await prepare(); ready = true; + await prepare(); + ready = true; } // ... } @@ -94,3 +95,9 @@ export async function handler(event) { }); } ``` + +## Parallel request workloads with the AWS SDK on AWS Lambda + +See also the section about parallel workloads in Node.js, which is +applicable to AWS Lambda: +[Performance/Parallel Workloads in Node.js](./performance//parallel-workloads-node-js.md). diff --git a/supplemental-docs/CLIENTS.md b/supplemental-docs/CLIENTS.md index 6aa3598bbd176..c93b5bb309a1e 100644 --- a/supplemental-docs/CLIENTS.md +++ b/supplemental-docs/CLIENTS.md @@ -533,6 +533,45 @@ client.middlewareStack.add( await client.listBuckets({}); ``` +### Middleware Caching `cacheMiddleware`. + +> Available only in [v3.649.0](https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.649.0) and later. + +By default (false), the middleware function stack is resolved every request, +because the user may modify the middleware stack by adding middleware to the +`client` or `command` instances at any time. + +By contrast, when `cacheMiddleware=true`, the creation of the middleware function stack +is cached on a per-client, per-command-class basis. + +In the following example, the S3 HeadObject Command is called 10 times, but +its middleware function stack is only created once, instead of once per request. + +```ts +// example: middleware caching +import { S3Client, HeadObjectCommand } from "@aws-sdk/client-s3"; + +const client = new S3Client({ cacheMiddleware: true }); + +for (let i = 0; i < 10; ++i) { + await client.send( + new HeadObjectCommand({ + Bucket: "...", + Key: String(i), + }) + ); +} +``` + +This caches the combination of `S3Client+HeadObjectCommand`'s resolved +`middlewareStack` upon the first request. This has two key effects: + +- request creation time is reduced by (up to) a few milliseconds per request +- modifying the middleware stack after requests have begun will have no effect. + +**Only enable this feature if you need the marginal increaese to +request performance, and are aware of its side-effects.** + ### Dual-stack `useDualstackEndpoint` This is a simple `boolean` setting that is present in most SDK Clients. diff --git a/supplemental-docs/README.md b/supplemental-docs/README.md index 2dd6973b86d9c..26394c29cc9f9 100644 --- a/supplemental-docs/README.md +++ b/supplemental-docs/README.md @@ -14,6 +14,11 @@ Upgrading from AWS SDK for JavaScript (v2) (https://github.com/aws/aws-sdk-js). Best practices for working within AWS Lambda using the AWS SDK for JavaScript (v3). +#### [Performance](./performance/README.md) + +Details what steps the AWS SDK team has taken to optimize performance of the SDK, +and includes tips for configuring the SDK to run efficiently. + #### [TypeScript](./TYPESCRIPT.md) TypeScript tips & FAQ related to this project. diff --git a/supplemental-docs/performance/README.md b/supplemental-docs/performance/README.md index 548fc65e69647..f0b3f233dbfff 100644 --- a/supplemental-docs/performance/README.md +++ b/supplemental-docs/performance/README.md @@ -14,3 +14,4 @@ Topics: - [Bundle Sizes](./bundle-sizes.md) - [Dynamic Imports](./dynamic-imports.md) - [Dependency File Count Reduction](./dependency-file-count-reduction.md) +- [Parallel workloads in Node.js](./parallel-workloads-node-js.md) diff --git a/supplemental-docs/performance/parallel-workloads-node-js.md b/supplemental-docs/performance/parallel-workloads-node-js.md new file mode 100644 index 0000000000000..a2495e1663182 --- /dev/null +++ b/supplemental-docs/performance/parallel-workloads-node-js.md @@ -0,0 +1,181 @@ +# Performance > Parallel workloads in Node.js + +Other sections such as bundle sizing, dependency count, and dynamic imports +cover aspects of performance related to the initial startup of your application. + +This section focuses on post-startup performance of request throughput. Specifically, +we cover performance configuration of the AWS SDK for JavaScript (v3) +in Node.js using HTTP/1.1 and the `node:https` module via the SDK's requestHandler +dependency, `@smithy/node-http-handler`. + +## What is a parallel workload? + +A parallel workload is any time you make more than one request +before the first request has completed. + +In single-threaded JavaScript, this is accomplished via the asynchronicity of `Promise`s. + +## Configuration options related to throughput + +Here is an example containing SDK Client configuration options that have +an effect on request throughput. + +```ts +// example: configuring an SDK client for throughput. +import { S3 } from "@aws-sdk/client-s3"; +import { NodeHttpHandler } from "@smithy/node-http-handler"; +import { Agent } from "node:https"; + +const s3 = new S3({ + /** + * Default is false. Setting this to true caches + * middleware resolution and prevents modifications + * to the middlewareStack from taking effect. + * + * Use only if you are not adding custom middleware. + */ + cacheMiddleware: true, + requestHandler: new NodeHttpHandler({ + httpsAgent: new Agent({ + /** + * Default is true. This should be left as true + * generally speaking, unless you have very specific + * use-case needing the alternative. + */ + keepAlive: true, + /** + * See expanded note below about sockets. + * You should use a number that is the size + * of your parallel workload batch. + */ + maxSockets: 50, + }), + }), +}); + +// shorthand syntax available since v3.521.0 +const client = new S3({ + requestHandler: { + requestTimeout: 3_000, + httpsAgent: { maxSockets: 50 }, + }, +}); +``` + +## Client instances + +In this SDK, much functionality is cached for performance reasons, but +the cache is usually associated with the client instance. In particular, +the following are cached on the client instance: + +- credentials fetched by async function calls + - if your client is configured to source credentials from a provider that includes + a network request and/or file-system read, this work is done once per client until + expiration of the credentials. If you instantiate a new client for every request, + this will slow things down substantially. +- middleware function stack when `cacheMiddleware=true` +- `node:https` Agent and its socket pool + +If you do need multiple instances of an SDK client, but don't want to +have separate credentials and socket pools, you can share +credentials and requestHandlers between clients. + +```ts +// example: credential and socket pool sharing from primary client. +import { S3 } from "@aws-sdk/client-s3"; + +const s3_east = new S3({ region: "us-east-1" }); + +const { credentials, requestHandler } = s3_east.config; + +const s3_west = new S3({ + region: "us-west-2", + credentials, + requestHandler, +}); +``` + +```ts +// example: credential and socket pool sharing from user instantiated objects. +import { S3 } from "@aws-sdk/client-s3"; +import { fromNodeProviderChain } from "@aws-sdk/credential-providers"; +import { NodeHttpHandler } from "@smithy/node-http-handler"; + +const credentials = fromNodeProviderChain(); +const requestHandler = new NodeHttpHandler({ + httpsAgent: { + maxSockets: 100, + }, +}); + +const s3_east = new S3({ region: "us-east-1", credentials, requestHandler }); +const s3_west = new S3({ region: "us-west-2", credentials, requestHandler }); +``` + +## Node.js Sockets + +The `node:https` Agent class manages sockets on your behalf. The most impactful configuration you can make for parallel workloads is to set +the value of `maxSockets`. + +Configuring the `maxSockets` value for the SDK's requestHandler should +be based on the parallelism or parallel workload batch size of your application +and usage scenario. + +- Configuring too few sockets leads to a slowdown as this is equivalent to + setting a lower cap on the parallel workload batch size. +- Configuring too many sockets can _also_ slow down your application. This is + because the application may open a new socket, which takes some CPU time, when + an existing socket was about to become free for reuse. + - configuring too many sockets can cause you to hit the file descriptor limit of the + operating system. This can manifest as `Error: EMFILE, too many open files` + in Node.js. + +## Example Scenario + +You have 10,000 files to upload to S3. + +- Uploading one at a time is too slow. +- Uploading all at once risks crashing your application process, or + being throttled by the server. + +#### Recommendataion + +Test your application to determine the right level of parallel request traffic. +After that, configure the `maxSockets` value to be equal to the batch size, or +a factor of it. + +```ts +// example: workload of 10,000 files, batch size of 100. +import { S3 } from "@aws-sdk/client-s3"; + +const files = [ + /*... */ +]; +const BATCH_SIZE = 100; + +const s3 = new S3({ + requestHandler: { + httpsAgent: { maxSockets: 100 }, + }, +}); + +const promises = []; +while (files.length) { + promises.push( + ...files.slice(0, BATCH_SIZE).map((file) => { + return s3.putObject({ + Bucket: "...", + Key: file.name, + Body: file.contents, + }); + }) + ); + await Promise.all(promises); + promises.length = 0; +} +``` + +In this example we've adhered to the best practices mentioned in this section: + +- use one client instance for repeated requests +- set a `maxSockets` value that is a factor of the batch size