diff --git a/awscli/topics/s3-config.rst b/awscli/topics/s3-config.rst new file mode 100644 index 000000000000..4fa8bb8e8e26 --- /dev/null +++ b/awscli/topics/s3-config.rst @@ -0,0 +1,135 @@ +:title: AWS CLI S3 Configuration +:description: Advanced configuration for AWS S3 Commands +:category: S3 +:related command: s3 cp, s3 sync, s3 mv, s3 rm + +The ``aws s3`` transfer commands, which include the ``cp``, ``sync``, ``mv``, +and ``rm`` commands, have additional configuration you can use to control +S3 transfers. This topic guide discusses these parameters as well as best +practices and guidelines for setting these values. + +Before discussing the specifics of these values, note that these values are +entirely optional. You should be able to use the ``aws s3`` transfer commands +without having to configure any of these values. These configuration values +are provided in the case where you need to modify one of these values, either +for performance reasons or to account for the specific environment where these +``aws s3`` commands are being run. + + +Configuration Values +==================== + +These are the configuration values you can set for S3: + +* ``max_concurrent_requests`` - The maximum number of concurrent requests. +* ``max_queue_size`` - The maximum number of tasks in the task queue. +* ``multipart_threshold`` - The size threshold where the CLI uses multipart + transfers. +* ``multipart_chunksize`` - When using multipart transfers, this is the chunk + size that will be used. + +These values must be set under the top level ``s3`` key in the AWS Config File, +which has a default location of ``~/.aws/config``. Below is an example +configuration:: + + [profile development] + aws_access_key_id=foo + aws_secret_access_key=bar + s3 = + max_concurrent_requests = 20 + max_queue_size = 10000 + multipart_threshold = 64MB + multipart_chunksize = 16MB + +Note that all the S3 configuration values are indented and nested under the top +level ``s3`` key. + +You can also set these values programatically using the ``aws configure set`` +command. For example, to set the above values for the default profile, you +could instead run these commands:: + + $ aws configure set default.s3.max_concurrent_requests 20 + $ aws configure set default.s3.max_queue_size 10000 + $ aws configure set default.s3.multipart_threshold 64MB + $ aws configure set default.s3.multipart_chunksize 16MB + + +max_concurrent_requests +----------------------- + +**Default** - ``10`` + +The ``aws s3`` transfer commands are multithreaded. At any given time, +multiple requests to Amazon S3 are in flight. For example, if you are +uploading a directory via ``aws s3 cp localdir s3://bucket/ --recursive``, the +AWS CLI could be uploading the local files ``localdir/file1``, +``localdir/file2``, and ``localdir/file3`` in parallel. The +``max_concurrent_requests`` specifies the maximum number of transfer commands +that are allowed at any given time. + +You may need to change this value for a few reasons: + +* Decreasing this value - On some environments, the default of 10 concurrent + requests can overwhelm a system. This may cause connection timeouts or + slow the responsiveness of the system. Lowering this value will make the + S3 transfer commands less resource intensive. The tradeoff is that + S3 transfers may take longer to complete. +* Increasing this value - In some scenarios, you may want the S3 transfers + to complete as quickly as possible, using as much network bandwidth + as necessary. In this scenario, the default number of concurrent requests + may not be sufficient to utilize all the network bandwidth available. + Increasing this value may improve the time it takes to complete an + S3 transfer. + + +max_queue_size +-------------- + +**Default** - ``1000`` + +The AWS CLI internally uses a producer consumer model, where we queue up S3 +tasks that are then executed by consumers, which in this case utilize a bound +thread pool, controlled by ``max_concurrent_requests``. The enqueuing rate +can be much faster than the rate at which consumers are executing tasks. +To avoid unbounded growth, the task queue size is capped to a specific size. +This configuration value changes the value of that maximum number. + +You generally will not need to change this value. This value also +corresponds to the number of tasks we are aware of that need to be +executed. This means that by default we can only see 1000 tasks ahead. +Until the S3 command knows the total number of tasks executed, the +progress line will show a total of ``...``. Increasing this value +means that we will be able to more quickly know the total number of +tasks needed, assuming that the enqueuing rate is quicker than the +rate of task consumption. The tradeoff is that a larger max queue +size will require more memory. + + +multipart_threshold +------------------- + +**Default** - ``8MB`` + +When uploading, downloading, or copying a file, the S3 commands +will switch to multipart operations if the file reaches a given +size threshold. The ``multipart_threshold`` controls this value. +You can specify this value in one of two ways: + +* The file size in bytes. For example, ``1048576``. +* The file size with a size suffix. You can use ``KB``, ``MB``, ``GB``, + ``TB``. For example: ``10MB``, ``1GB``. Note that S3 imposes + constraints on valid values that can be used for multipart + operations. + + +multipart_chunksize +------------------- + +**Default** - ``8MB`` + +Once the S3 commands have decided to use multipart operations, the +file is divided into chunks. This configuration option specifies what +the chunk size (also referred to as the part size) should be. This +value can specified using the same semantics as ``multipart_threshold``, +that is either as the number of bytes as an integer, or using a size +suffix.