-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 and GCS writes are limited to 50GB #5720
Comments
FYI, I tried this sync multiple times now and every single time it fails after exactly Edit: Looks like this is rather happening after uploading exactly |
So I've looked into this to the best of my knowledge and it seems like this is a fundamental limitation of the way an S3 multipart upload works - there can not be more than Line 38 in a53dd7e
10000 * 5MB to an S3 destination.
According to the documentation in the library that is used for writing to the underlying S3 destination at http://alexmojaki.github.io/s3-stream-upload/javadoc/apidocs/alex/mojaki/s3upload/StreamTransferManager.html#partSize-long-, what needs to be done here is to increase the size of the parts itself, since the number of parts can't be increased. Should this be a configurable option in the destination settings? |
@flagbug thanks for the thorough write up! will take a look |
Hi @flagbug. Many thanks for your investigation. Would it be an acceptable fix if we made those args configurable from UI? Regards, |
@etsybaev For me personally sure, but I guess this is something that the Airbyte team needs to decide if it makes sense 😄 |
I believe the same applies for Snowflake. Seems that the fix #5890 only resolved the issue for S3 and GCS destinations, but not when using storage buckets as a staging loading method for data warehouses. |
@etsybaev re-opening this issue |
Enviroment
Current Behavior
When syncing our MSSQL database to Google Cloud Storage, after reading few million rows, the job stops with an
java.lang.IndexOutOfBoundsException
.Note that the job doesn't fail, it just stops and never continues.
Expected Behavior
The syncing just works™
Logs
Steps to Reproduce
Are you willing to submit a PR?
Unlikely, except if it's a very obvious fix 😄
The text was updated successfully, but these errors were encountered: