-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 uploader docs question #2228
Comments
Hi @iselind, thanks for reaching out to us about this. Unfortunately Until this is implemented, Issue #232 includes a link to a custom uploader written by @cespare that may work for your use case, or that you could use as a starting point to implement your own custom uploader. |
Thanks a lot for looking into my issue, much appreciated! I'll take a look at the other issues you mention and get appropriately involved in those. Your suggestion for my own multipart uploader seem strange to me. In order for an io.Seeker to work, you need the data in a seekable storage, i.e. the data needs to be in RAM. The requirement for io.Seeker is, as i understand it, to a big extent the reason why the s3manager requires to much memory. Perhaps i misunderstood your suggestion? |
Seeking files and reading sections of them should generally only pull in the relevant pages to the buffer cache. It's definitely not the case that only data in RAM is seekable. |
Firstly; how can you in any way or respect be confident that an io.Reader is in fact a file? In our case we get our data from a socket. Secondly; even if the io.Reader is a file, that's still not page based until the file has been either memory mapped or read into RAM. Either way you end up in RAM again. Perhaps i'm missing something vital here? |
We can't. That's why my uploader code takes a file-like interface instead of a plain io.Reader.
Hmm, I'm not quite sure what you're getting at here. The pages of the file may be on disk only or they may be on disk and in the page cache (memory). It's true that if you seek to the middle of the file and then read some data (to upload it) then those pages are now in page cache, but the important things are (1) the kernel may drop those pages from the cache later after we're done reading that chunk of the file, and (2) the data for a chunk doesn't need to all be copied into the Go process's heap at once. That's why my uploading code has much lower memory usage than s3manager.Uploader. |
Thanks for the update. The issue that you're running into seems to be directly related to using an Most likely the memory error that you're running into is due to the Uploader's default number of concurrent uploads ( One potential way to limit the impact of this is to set the Uploader's // Perform an upload.
resp, err := uploader.Upload(upParams, func(u *s3manager.Uploader) {
u.Concurrency = 1
}) |
Another way you could solve this issue with such a restricted memory is to use the SDK to generate a presigned URL for the request, then use the stdlib http.Client to stream the object to S3. Presigned URLs support streaming. Currently the AWS SDK for Go does not support streaming uploads with the Here is an example of using a presigned URL for uploading an S3 Object. https://github.com/aws/aws-sdk-go/tree/master/example/service/s3/presignURL This pattern will consume the least amount of memory. We have #2036 tracking refactoring the S3 Upload Manger's performance, but either reducing the concurrency, or using presigned URLs should work for your memory threshold. |
Thanks @jasdel I'll look into those options and get back to you. I think the presigned URL solution looks the most promising. Would you mind keeping this issue open until i've tested those options? It will probably not take more than a week for me to test this properly... |
Thanks for the feedback, sure we can leave the issue open. I'll make it as pending feedback. Let us know how the change goes. |
It seems we hit a snag, perhaps you can see what we're missing. <?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NotImplemented</Code>
<Message>A header you provided implies functionality that is not implemented</Message>
<Header>Transfer-Encoding</Header>
<RequestId>4FC2B7F1A122D0C6</RequestId>
<HostId>fKG9On8k+o7mGCpu/FlslvghnsTx9deoMCaFTs9nIDNmUucvX3esKq7Ih53649+M99qK/PnehQ8=</HostId>
</Error> back as a response. The code looks like this now: svc := getSession()
fname := fmt.Sprintf("%s-%s-%s-%d.core", meta["SerialNumber"], meta["ProdNbr"], meta["Version"], time.Now().Unix())
putObj, _ := svc.PutObjectRequest(&s3.PutObjectInput{
Bucket: aws.String(coreFileBucket),
Key: aws.String(fname),
ContentLength: aws.Int64(0), // We cannot know the length in advance
})
url, err := putObj.Presign(15 * time.Minute)
if err != nil {
return "", err
}
req, err := http.NewRequest("PUT", url, body)
if err != nil {
return "", err
}
resp, err := http.DefaultClient.Do(req) I think we've followed the linked example code. I cannot see what we're doing wrong. We get the same issue if we exclude the line setting content-length to 0 in the PutObjectRequest. |
@iselind leave the In addition I suggest also using |
@jasdel Thanks a lot for your insights. We set the content length to 0 because we cannot possibly know the length of the body before receiving it all. As we cannot store it temporarily on disk we are left with RAM and we haven't won anything by using Presign(). I assume PresignRequest() has the same limitation, right? I found this issue which seem very similar to my issue. They seem to have solved the content length issue. Can the same be done in the GoLang AWS SDK? I cannot interpret this in any other way than there is a way to do chunked transfers to S3. I know that issue is for the javascript version of the AWS SDK but the language can hardly be the culprit here, right? Based on the previous discussion it seems to me that this issue is in essence based on the limitations in the AWS REST API and not really in the SDK. Based on this i'd say changing language to something like C, Python, or Java wouldn't change a thing in this regard. So changing the concurrency to 1 in the uploader seems to be our only vay forward then in trying to reduce the memory consumption of the uploader. Would you agree? |
Would it be possible to bring back streaming uploads for PutObject? The SDK used to support this (many iterations ago ;-) and the underlying API supports it just fine. Supplying the content MD5 is optional so if the user is willing to live without this protection then it would be great if there was a PutObject variant which could stream. I'd like to use this in rclone for uploading files with the minimum memory usage. I know the MD5SUM in advance so I'd like to set that and stream the content. I'll try to see if I can get the presigned URL uploader to work. |
I did get it to work - thanks for the hints above :-) |
@ncw I think for the SDK to support streaming we need to add additional functionality to the SDK's internals to allow an API parameter to be an S3's PutObject and similar methods are special, and support the magic value |
That would be nice. In the mean time the code above plus your hints makes for a working uploader provided you know the size of the object in advance. For rclone's use this is fine. For objects for which the size isn't known I'll just use s3uploader as before. I note that Backblaze have solved the hash problem in a different way to allow streaming - they allow the last 20 bytes of the upload stream to be the sha1 hash. One could also supply hashes in http trailers. Thanks for your help :-) |
I am not an expert neither in Go nor S3, please recommend me the most correct way to stream the response from HTTP to S3 without touching the SSD and without waiting to fetch the remote file completely, and also I need to use received from HTTP body after uploading to S3, without fetching this data again. I found that for reusing of streamed data I can use io.TeeReader, and about stream from http.Response - as I read in messages above I still need to use HTTP API PUT of S3 instead of Go SDK? |
This issue can be closed as I see it. It doesn't seem to go anywhere. It's unfortunate that the S3 API doesn't support streaming. I'll close the issue. If anyone is against this, please re-open. |
Please fill out the sections below to help us address your issue.
Version of AWS SDK for Go?
Latest
Version of Go (
go version
)?go version go1.11 linux/amd64
What issue did you see?
While using the s3 manager to upload a file, it seems the entire file is read into RAM before it's being sent. The docs give no clue how we can limit this behaviour. Our code base runs in a Kubernetes pod with a restricted amount of available memory.
Our file's sizes can vary from 1 MB to 1 GB. We need a way to control the RAM consumption while uploading these files.
We'd require a way to stream the file to the S3 bucket without having to read the entire file into RAM before sending to S3.
Steps to reproduce
Create a Kubernetes pod with a RAM limit of say 10MB and try to upload a 100MB file to any AWS bucket. The pod will be killed by the OOMkiller.
Example code from our code base:
The text was updated successfully, but these errors were encountered: