-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for string[pyarrow] dtype #954
Comments
Thanks for the request @bnaul ! This does look like an improvement. I wonder if the existing |
Note: even if That could prevent some unnecessary transformations to Python string object and back again. |
Yep, Maybe rather than clutter the API with another argument, it would make more sense to just set the default string type to |
I hesitate to do it by default when pandas still considers it "experimental". Then again, so is the Int64 dtype, but I'm planning on using it by default in In this case, there isn't a data loss issue with the pandas default behavior and it's not been around quite as long as int64, so I'm not as keen to use What if we had a |
Definitely fair, as far as I can tell the "experimental" label was simply copy-pasted from the other array type docstrings ( |
Update: definitely don't make it the default, probably should have taken the warning a little more seriously...this seems like extremely basic behavior that isn't supported
Digging a bit more there's also pandas-dev/pandas#42597 etc so it's definitely not feature complete. |
I've been doing some thinking about this issue. I think for v3, we should add some kind of string dtype support. Possibly: string[pyarrow] if available, then string, then object. That would let us continue to support a wide range of pandas versions. Alternatively/in-addition we could expose the types mapper argument. Our default types mapper could call the user-supplied one and only continue with the default logic if the user-supplied one returned None. |
Oh, just looked at the thread. Yeah, let's not make it the default in that case. Exposing types mapper should still be done for v3 |
Update: we do use types mapper now, but haven't yet provided an override for string or other dtypes.
Might make more sense for this to be string specific than exposing Arrow types as part of the pandas to_dataframe API. |
Pandas 1.3 added a new
string[pyarrow]
dtype which can be considerably more memory-efficient.I'm not sure what all would be involved but obviously it would be nice to support this natively since presumably(?) we already communicate the data in the appropriate format for the pyarrow string type before converting it back to python string objects. Maybe an option like was introduced in #848 for geography types could be used to determine the behavior?
The text was updated successfully, but these errors were encountered: