-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i32 limit in JSON stats #2646
Comments
can you share a reproducible example ?. below code is working fine
|
Thank for your example @sherlockbeard , I can confirm that works fine for me as well. I have been trying to find the offending JSON file in the |
delta table reads the last checkpoint and json's created after that there should be a after that you can select json's like
|
Big thanks Sherlock, found the culprit! This is the JSON in
|
hey @alfredolainez in the bug the number is edited my _last_checkpoint size_in_bytes |
The table is being written to frequently so the number is changing. When I checked |
Interesting. I copied my Copying just this as the
For reference, I am using deltalake==0.18.2 |
yep using this able to reproduce funny thing i tried with ok got the reason . just to confirm your table have deletion vectors ?. |
Using that resulted in a different error for me, I think my table is a bit different than yours. Seeing this issue (#1468) it seems that SizeInBytes is optional so probably that's why. Not sure if the table has deletion vectors but I would imagine so since storage performance is critical here. Is there any way I can easily check? |
@rtyler , @ion-elgreco fixing this will require change in delta portal DeletionVectorDescriptor field |
now i am little confused but delta-rs doesn't support reading with deletion vector . |
I don't have a lot of details on how the table is written but as far as I understand it is Spark. This is the first time I am trying to read these tables with Polars, so not sure if it used to work before. However I can read other tables in the same lake successfully, and I can see the Deletion vectors might be unrelated though, no? Our toy example shows the problem without them. |
created a pr that is fixing the dummy example . |
Environment
Delta-rs version: 0.18.2
Binding: Python
Environment:
Bug
What happened:
When reading a DeltaLake table from Polars using
pl.read_delta
, I get the following error:DeltaProtocolError: Invalid JSON in file stats: invalid value: integer
4051124561, expected i32 at line 1 column 70
which ultimately comes from deltalake:
What you expected to happen: My code in Python can successfully read other tables, it is just this particular table that throws this problem. The particular table where this happens is frequently accessed and I can read it successfully through Spark, so I was expecting to read it as well from deltalake. Not sure if the int32 limitation is part of the protocol or the library should allow for bigger int types.
The text was updated successfully, but these errors were encountered: