-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty array giving error #2439
Comments
@kesavkolla Is it possible that you could provide a minimal repro parquet file that we can use for testing? |
I had to create a zip file from the compressed parquet because github is not letting to updload files otherwise. Just unzip the file and try to load it via datafusion then we get the above error. Here is my code snippet #[tokio::main]
async fn main() -> Result<(), String> {
let ctx = SessionContext::new();
ctx.register_parquet("patient", "part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet",
ParquetReadOptions::default()).await.unwrap();
let df = ctx.sql("SELECT patient.id FROM patient LIMIT 10").await.unwrap();
df.show().await.unwrap();
Ok(())
} part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet.zip |
I believe this will be fixed in arrow 15 (in a few weeks) due to @tustvold 's work in apache/arrow-rs#1682 |
Yup, I can confirm with apache/arrow-rs#1700 it is possible for the parquet reader to handle the file. Unfortunately work remains to support this on DataFusion's side, e.g. #2453. I fully intend to get this working, there just be many bugs along the way 😅 |
Looking forward to do early testing |
I wish this is included in arrow 14 itself. :-) |
Could we close this issue? |
I just tried this file with @kesavkolla 's example using ❯ create external table patient stored as parquet location 'part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet';
0 rows in set. Query took 0.002 seconds.
❯ SELECT patient.id FROM patient LIMIT 10;
+--------------------------------------+
| id |
+--------------------------------------+
| eTplvxRvcd-eT1nEI8BvQRQ3 |
| 1127421b-66fd-85ff-b92e-827c9a280be2 |
| 3bdac299-3731-0acd-9cd9-fbba40236e3a |
| 47e6dfcb-30a9-2a66-c8fb-984da359ead1 |
| 8aec8d06-74af-aed5-0132-5b77fc6b418b |
| 03af4547-1d67-68ae-ebbc-cc5a9fc6c898 |
| 180aefff-e7cc-940e-cb2c-3ed99c1aed39 |
| 966070ad-f22e-631f-acf2-657c90f903f1 |
| d94a1af1-bf9b-a705-bc54-2d0009c39981 |
| 7f217143-f36c-707a-b77c-f4a1cf70f952 |
+--------------------------------------+
10 rows in set. Query took 0.003 seconds. so closing as complete |
I have a parquet file with schema of nested array. There are several levels where type is array but data is empty. When use this data with datafusion I am getting this error:
Even if use projection with only non nested columns then also I am getting the above error. Can you someone share thoughts on how can I get away with this error?
The text was updated successfully, but these errors were encountered: