-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: json in struct destination type #1187
Conversation
Question: Does this change anything with regards to I/O? For example, I notice with |
How did you test? I tried pandas df with pa.large_string dtype is interpreted as normal string. It shouldn't affect input as large_string, since we convert it to normal string in our I/O. screen/763yZzyywmEVWbR destination table screen/4u6scyqtczXSAK8 |
It was part of my debugger step-through of #371 but it's possible I accidentally changed something in the I/O path there. |
@@ -107,7 +107,7 @@ def from_table( | |||
raise ValueError("must set at most one of 'offests', 'primary_key'") | |||
if any(i.field_type == "JSON" for i in table.schema if i.name in schema.names): | |||
warnings.warn( | |||
"Interpreting JSON column(s) as StringDtype. This behavior may change in future versions.", | |||
"Interpreting JSON column(s) as StringDtype and pyarrow.large_string. This behavior may change in future versions.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just say "... as pyarrow.large_string"? StringDtype here is not relevant anymore, no?
@@ -2564,13 +2564,13 @@ def _get_rows_as_json_values(self) -> Block: | |||
), | |||
T1 AS ( | |||
SELECT *, | |||
JSON_OBJECT( | |||
TO_JSON_STRING(JSON_OBJECT( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an explicit TO_JSON_STRING? Doesn't the change in ibis_types.py::ibis_dtype_to_bigframes_dtype take care of this post read?
if isinstance(ibis_dtype, ibis_dtypes.JSON):
...
return bigframes.dtypes.JSON_DTYPE
Patches the json output as BQ JSON type instead of STRING. Which avoids JSON destination table error in b/381148539 and unblocks Multimodal integrations.
Our current implementation convert JSON to STRING completely when reading into BigFrames. Here it uses pd.ArrowDtype(pa.large_string()) as BigFrames Dtype and pa.large_string() as PA dtype, and pass the information though in BigFrames to determine the output type and parse STR back to JSON at the end.
It is a workaround of the workaround of JSON as STR. We shall rework the JSON support after pyarrow brings in json type.