Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure uploading large files (handling slowDown) #479

Open
wvengen opened this issue Mar 4, 2024 · 10 comments
Open

Failure uploading large files (handling slowDown) #479

wvengen opened this issue Mar 4, 2024 · 10 comments
Assignees
Labels
blocked This issue is blocked by something else, please specify and remove the label once unblocked

Comments

@wvengen
Copy link
Contributor

wvengen commented Mar 4, 2024

During a large crawl (2GB+), I get stuck in the "Uploading WACZ" stage (using OpenStack SWIFT S3 for storage). The log shows

{"timestamp":"2024-03-04T12:07:55.250Z","logLevel":"debug","context":"general","message":"WACZ successfully generated and saved to: /crawls/collections/thecrawl/thecrawl.wacz","details":{}}
{"timestamp":"2024-03-04T12:07:55.255Z","logLevel":"info","context":"s3Upload","message":"S3 file upload information","details":{"bucket":"x","crawlId":"x","prefix":"x/"}}
{"timestamp":"2024-03-04T12:08:03.027Z","logLevel":"error","context":"general","message":"Crawl failed","details":{"type":"exception","message":"","stack":"S3Error\n    at Object.parseError (/app/node_modules/minio/dist/main/xml-parsers.js:79:11)\n    at /app/node_modules/minio/dist/main/transformers.js:165:22\n    at DestroyableTransform._flush (/app/node_modules/minio/dist/main/transformers.js:89:10)\n    at DestroyableTransform.prefinish (/app/node_modules/readable-stream/lib/_stream_transform.js:123:10)\n    at DestroyableTransform.emit (node:events:514:28)\n    at prefinish (/app/node_modules/readable-stream/lib/_stream_writable.js:569:14)\n    at finishMaybe (/app/node_modules/readable-stream/lib/_stream_writable.js:576:5)\n    at endWritable (/app/node_modules/readable-stream/lib/_stream_writable.js:594:3)\n    at Writable.end (/app/node_modules/readable-stream/lib/_stream_writable.js:535:22)\n    at IncomingMessage.onend (node:internal/streams/readable:705:10)"}}
{"timestamp":"2024-03-04T12:08:03.036Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: failing","details":{}}

and the error message mentioned is

S3Error
    at Object.parseError (/app/node_modules/minio/dist/main/xml-parsers.js:79:11)
    at /app/node_modules/minio/dist/main/transformers.js:165:22
    at DestroyableTransform._flush (/app/node_modules/minio/dist/main/transformers.js:89:10)
    at DestroyableTransform.prefinish (/app/node_modules/readable-stream/lib/_stream_transform.js:123:10)
    at DestroyableTransform.emit (node:events:514:28)
    at prefinish (/app/node_modules/readable-stream/lib/_stream_writable.js:569:14)
    at finishMaybe (/app/node_modules/readable-stream/lib/_stream_writable.js:576:5)
    at endWritable (/app/node_modules/readable-stream/lib/_stream_writable.js:594:3)
    at Writable.end (/app/node_modules/readable-stream/lib/_stream_writable.js:535:22)
    at IncomingMessage.onend (node:internal/streams/readable:705:10)

Trying to reproduce this, it appears that uploading large files triggers a slowDown response from the S3 server, which the MinIO client does not seem to handle automatically.

dd if=/dev/zero of=/tmp/foo bs=1M count=2k
// Javascript
var Minio = require('minio')
var s3Client = new Minio.Client({ endPoint: 's3.example.com', accessKey: 'xx', secretKey: 'xx', partSize: 100*1024*1024 })
await s3Client.fPutObject('x', 'foo', '/tmp/foo')

eventually gives the error

Uncaught S3Error
    at Object.parseError (/app/node_modules/minio/dist/main/xml-parsers.js:79:11)
    at /app/node_modules/minio/dist/main/transformers.js:165:22 {
  code: 'SlowDown',
  bucketname: 'x',
  requestid: 'x',
  hostid: 'x',
  amzRequestid: null,
  amzId2: null,
  amzBucketRegion: null

Amazon mentions this that 503 slow down responses can be present; see also best practices, which recommends to reduce request rate.

Do we need support for handling slowDown responses from the S3 endpoint?

@wvengen
Copy link
Contributor Author

wvengen commented Mar 4, 2024

I did not find anything about minio-js handling of slowDown responses, I don't think it is supported. So either this needs to be handled here, or perhaps the AWS S3 client might have support to handle this? In any case, the request would need to be retried after some timeout (probably with some increasing delay factor in case the server is not yet ready to proceed).

@tw4l
Copy link
Member

tw4l commented Mar 4, 2024

@wvengen From minio/minio#11147, it seems like maybe one way of approaching this would be to configure the Minio server's MINIO_API_REQUESTS_DEADLINE to a higher value.

In Browsertrix Cloud, we should be able to set this as an env var to a higher value if needed in chart/templates/minio.yaml.

Otherwise, would need to set that however Minio is being deployed.

@wvengen
Copy link
Contributor Author

wvengen commented Mar 5, 2024

Thanks for your response!
Yes, if I would be running my own Minio server, that would be true. But this is an S3 service by a cloud provider (OpenStack SWIFT) that I have no control over, and there are probably reasons why it is configured this way (e.g. to avoid overloading the server, or waiting for until resources for the bucket are scaled up, like it can happen for AWS S3).

@tw4l
Copy link
Member

tw4l commented Mar 5, 2024

Hm, good point. I don't think we've tested with OpenStack SWIFT, so we haven't seen this issue, but you're right that some general exception handling to slow down responses on a 503 (and perhaps 429 Too Many Requests) might not be a bad idea.

Can see if we're also able to enable debug logging via the minio js client. I'm marking this issue for investigation in the coming sprint and will report back.

@tw4l tw4l self-assigned this Mar 5, 2024
@tw4l tw4l moved this from Triage to Todo in Webrecorder Projects Mar 5, 2024
@tw4l
Copy link
Member

tw4l commented Mar 5, 2024

Another thing to keep in mind: In the past when working with other applications, SWIFT has proved to be a problem for files > 5 GB, as SWIFT expects large files to be segmented a particular way. Not sure if that might be an issue with the crawler/minio-js client/SWIFT S3 endpoint as well.

For context: https://docs.openstack.org/swift/latest/overview_large_objects.html

@wvengen
Copy link
Contributor Author

wvengen commented Mar 6, 2024

Thank you, I didn't know about SWIFT's large object support. (The files I had issues with were <5GB, but I might run into this issue later.) But it looks like SWIFT's S3 layer does convert multipart uploads to large object segments, so large objects should be supported when using S3. And I also see references to multipart delete in the source code, so I suppose that would be supported as well.
All in all, handling slow down responses might just be enough here.

@wvengen
Copy link
Contributor Author

wvengen commented Mar 12, 2024

Experimenting with using AWS S3 SDK instead of Minio client in this forked branch.
update I am able to upload 2GB files with the client from the AWS S3 SDK, so it's slightly better, but now I get EPIPE on 4GB files, so that doesn't solve it per se. Note that the AWS S3 SDK uses smithy's retry strategy.

@ikreymer
Copy link
Member

Thanks for looking into this! Yes, happy to switch to the AWS S3 client instead of Minio if that works better, but I think we're generally limited to using an existing S3 client for this. I suppose you could always limit to smaller file sizes, but that may be less than ideal..

@wvengen
Copy link
Contributor Author

wvengen commented Mar 26, 2024

Thanks, @ikreymer. I'm investigating this more with our storage provider. In any case, I already see that the AWS S3 SDK handles Slow Down, whereas I did not see the MinIO client doing that. Also, the AWS S3 SDK first asks the server to confirm that data is acceptable before sending it (by means of CONTINUE). So I think switching to the AWS S3 SDK has several benefits.

Would you like me to prepare a pull request? (There are some things to clean up.)

@ikreymer
Copy link
Member

Thanks, @ikreymer. I'm investigating this more with our storage provider. In any case, I already see that the AWS S3 SDK handles Slow Down, whereas I did not see the MinIO client doing that. Also, the AWS S3 SDK first asks the server to confirm that data is acceptable before sending it (by means of CONTINUE). So I think switching to the AWS S3 SDK has several benefits.

Would you like me to prepare a pull request? (There are some things to clean up.)

Thanks, would definitely appreciate it! There was also a request for region support in #515, and looks like you were addressing that as well.

ikreymer pushed a commit that referenced this issue Apr 17, 2024
This should address the issue of connecting to buckets stored outside
us-east-1
(#515) while
the switch from Minio client to AWS SDK is being worked on
(#479)

Co-authored-by: Mattia <[email protected]>
@Shrinks99 Shrinks99 added the blocked This issue is blocked by something else, please specify and remove the label once unblocked label Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This issue is blocked by something else, please specify and remove the label once unblocked
Projects
Status: Todo
Development

No branches or pull requests

4 participants