-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic S3 error: Converting table to pandas and pyarrow table fails. #1256
Comments
@shazamkash - Thanks for reporting this! From the response you showed it seems like we are running into some sort of throttling on the storage side. Though not quite sure why. Could you see what happens if you configure the pyarrow s3 filesystem adn pass that to |
I tried what you asked and please find the code and errors below: Code:
Error from dt.to_pyarrow_dataset()
Error from dt.to_pandas()
Here is a list of files which I can get running the following code and this works as well
List of files:
|
Another thing I noticed is that, this only happens with data which is "big" in size few 100's MB to few GB and is split into multiple parquet files . I can read the tables which are very small like few 10's of MB and are save in singular file. Any help would be appreciated in this matter. Because I have read the same data before with delta-rs with an older version and back then it worked fine. Unfortunately I don't remember now what was the exact delta-rs version. Also here is the full error which I was able to get now:
|
can I take it ? |
@tsafacjo - certainly :) |
Environment
Delta-rs version: 0.8.1
Binding: Python
Environment:
Docker container:
Python: 3.10.7
OS: Debian GNU/Linux 11 (bullseye)
S3: Non-AWS (Ceph based)
Bug
What happened:
When reading delta table, the table is read fine and also exists. But then converting that table to pandas or from pyarrow dataset to table is failing with the same error below.
I have tried reading the same table with PySpark and it works fine. The parquet file is about 1 GB compressed and 3 GB uncompressed in size. Furthermore, the table was written to deltalake using the same delta-rs version.
Error:
How to reproduce it:
My Code:
More details:
I am not sure if this information helps. But I get the same error when reading using Polars.
The text was updated successfully, but these errors were encountered: