-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Handle Date32 columns in Arrow tables and Polars DataFrames #3377
Conversation
@@ -56,7 +56,7 @@ def _dataset_name(values: Union[dict, list, core.InlineDataset]) -> str: | |||
values = values.to_dict() | |||
if values == [{}]: | |||
return "empty" | |||
values_json = json.dumps(values, sort_keys=True) | |||
values_json = json.dumps(values, sort_keys=True, default=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For computing the values hash, I think it's fine to fallback to string representation for types not supported by json.dumps
(datetime.date
in this case).
@@ -241,7 +241,7 @@ def check_pre_transformed_vega_spec(vega_spec): | |||
|
|||
# Check that the bin transform has been applied | |||
row0 = data_0["values"][0] | |||
assert row0 == {"a": "A", "b": 28, "b_end": 28.0, "b_start": 0.0} | |||
assert row0 == {"a": "A", "b_end": 28.0, "b_start": 0.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VegaFusion 1.6.6 strips out the "description" encoding field (which isn't used in the canvas renderer), so it's able to do a better job dropping unused columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonmmease, really nice PR! Literature level code👍.
I added one more commit with a few more isinstance(data, DataFrameLike)
over hasattr(data, __dataframe__)
I'm surprised that pandas has no a to_arrow()
method or something a like.
for convert_method_name in ("arrow", "to_arrow", "to_arrow_table"): | ||
convert_method = getattr(dfi_df, convert_method_name, None) | ||
if callable(convert_method): | ||
result = convert_method() | ||
if isinstance(result, pa.Table): | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a joy to read this type of code diff. Really nice approach!
All tests pass. Merging! Thanks again @jonmease! |
Closes #3280
This includes:
from_dataframe
function (which doesn't support Date32 yet).DataFrameLike
protocol to be@runtime_checkable
(which adds support forisinstance
checks), and switched ourif hasattr(data, '__dataframe__'):
calls toif isinstance(data, DataFrameLike):
calls. This way mypy knows thatdata
is aDataFrameLike
inside the branch and is happy for it to be passed to thearrow_table_from_dfi_dataframe
function.