-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python library error reading from large Delta Table on S3 #882
Comments
When I run with RUST_BACKTRACE=full, I get the following stacktrace fragment (other frames are unknown):
|
Hmm. this one is a bit of a puzzle, as it seems that the connection to the storage location fails at some point during download of a file. Could you try to use the pyarrow native S3 file system to see if the error still occurs? You have to wrap it in a sutree filesystem though as described here. https://delta-io.github.io/delta-rs/python/usage.html#custom-storage-backends |
Using filesystem = fs.SubTreeFileSystem("<bucket>/<table key>", fs.S3FileSystem())
dt.to_pandas(filesystem=filesystem) solves the |
I also tried with the SubTreeFileSystem and that has solved my PanicException too. Does that give any hints as to what's wrong with the original version? I'm happy to debug further... |
No need to debug further; I've found a couple issues in #893 while adding integration tests with S3. |
@joshuarobinson - is it possible for you to build off current main and see if the error still exists? We have some indication that the dropped clients issue could be resolved now. |
I got a similar (but not identical) traceback with a table of a few thousand rows on S3-compatible storage (LakeFS backed by Minio): deltalake.PyDeltaTableError: Generic S3 error: Error performing get request main/silver/re/list/part-00003-a23a8231-bcb4-40f7-81d7-1dc3515ea120-c000.snappy.parquet: response error "request error", after 0 retries: error sending request for url (http://lakefs:8000/<bucket>/main/silver/re/list/part-00003-a23a8231-bcb4-40f7-81d7-1dc3515ea120-c000.snappy.parquet): dispatch task is gone: runtime dropped the dispatch task Table was written using Spark 3.1.1 / delta-spark 2.1.1, reading using |
@roeap I don't currently have the bandwidth to build off main, but I'll follow and try on the next release |
@joshuarobinson We just released 0.6.4. Let us know if you still have any issue with reading tables. |
@joshuarobinson - could you check if the latest release fixes this for you? |
@roeap I can confirm that version 0.7.0 now allows me to write a delta table successfully. thanks for the follow-up |
Great! Will close the issue then :) |
Environment: ubuntu 22.04, reading from on-prem S3 object store to Arrow Table
Delta-rs version: 0.6.2
Binding: Python
Bug
What happened:
PanicException when reading in from a Delta table in an on-prem S3 object store (Swift).
Reading some tables seem to work okay and others not, anecdotally the smaller ones are okay and larger are not.
How to reproduce it:
More details:
Error message:
The text was updated successfully, but these errors were encountered: