-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support additional configuration parameters for S3 transfer #4359
Comments
For the record: seems like those options are handled by aws-cli specifically, and not by the boto3 itself: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html . This will require parsing the config ourselves and applying the options where needed, inspired by aws-cli(need to look into aws-cli source to make it match). All the code related to s3 and s3 configs is in https://github.com/iterative/dvc/blob/master/dvc/tree/s3.py. |
Another case where having this would help: https://discord.com/channels/485586884165107732/485596304961962003/804679936886571019 (when the individual files are well over 10GB, and boto3 tries to chunk them in little parts) |
@efiop I think at least we can support a subset of these variables, the stuff that goes into a |
According to the issue from discord. I'm using S3 compatibile API, so could You also include those parameters (multipart_chunk_size, multipart_threshold) in .dvc/config? In that way .dvc/config would override the values saved in ~/.aws/config. |
@efiop what do you about this? Should we only support |
Problem
When using S3 as remote for DVC, boto3 reads also the configuration file
~/.aws/config
typically used by the original AWS CLI tool. Not all configuration parameters from this file have an effect in DVC at the moment. For instance, the last linemultipart_threshold = 500MB
will be skipped by DVC while the previous linesignature_version = s3
will be applied when pushing data into S3:Suggestion
Suggestion is to support optional yet useful parameters from
~/.aws/config
as per AWS CLI S3 documentation to control S3 transfers for performance reasons or to unlock the advanced configuration capabilities.These parameters should be passed to TransferConfig:
max_concurrent_requests
- The maximum number of concurrent requests.max_queue_size
- The maximum number of tasks in the task queue.multipart_threshold
- The size threshold the CLI uses for multipart transfers of individual files.multipart_chunksize
- When using multipart transfers, this is the chunk size that the CLI uses for multipart transfers of individual files.max_bandwidth
- The maximum bandwidth that will be consumed for uploading and downloading data to and from Amazon S3.If it is reasonable and doable - support also the following set of parameters:
use_accelerate_endpoint
- Use the Amazon S3 Accelerate endpoint for all s3 and s3api commands. You must first enable S3 Accelerate on your bucket before attempting to use the endpoint. This is mutually exclusive with the use_dualstack_endpoint option.use_dualstack_endpoint
- Use the Amazon S3 dual IPv4 / IPv6 endpoint for all s3 and s3api commands. This is mutually exclusive with the use_accelerate_endpoint option.addressing_style
- Specifies which addressing style to use. This controls if the bucket name is in the hostname or part of the URL. Value values are: path, virtual, and auto. The default value is auto.payload_signing_enabled
- Refers to whether or not to SHA256 sign sigv4 payloads. By default, this is disabled for streaming uploads (UploadPart and PutObject) when using https.These values must be set under the top level
s3
key in the AWS Config File, which has a default location of~/.aws/config
. Below is an example configuration:The text was updated successfully, but these errors were encountered: