-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for StringDtype (fixes #1237) #2319
Conversation
You should try to implement this the same way as done here:
And please use the same PR as you make changes, rather than opening a new one. |
@timkpaine I have implemented the changes. Kindly let me if there are any more changes after you review them. Thank you. |
It should be in the file linked, near the line linked. |
12d1845
to
007cd27
Compare
@texodus this lgtm, I know we're looking to replace all of pandas / numpy with native arrow in/out so I leave up to you whether or not you want to merge. |
@@ -78,6 +78,12 @@ def deconstruct_pandas(data, kwargs=None): | |||
if isinstance(v, pd.CategoricalDtype): | |||
data[k] = data[k].astype(str) | |||
|
|||
# convert StringDtype to str | |||
if isinstance(data, pd.DataFrame) and hasattr(pd, "CategoricalDtype"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this hasattr
necessary?
# convert StringDtype to str | ||
if isinstance(data, pd.DataFrame) and hasattr(pd, "CategoricalDtype"): | ||
for k, v in data.dtypes.items(): | ||
if isinstance(v, pd.StringDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe this was meant to be used in the hasattr
, hasattr(pd, "StringDType")
since StringDType
is the new type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @MaDufie! Looks good!
We should really be taking advantage of the internal dictionary for Categories, but we don't really provide any fast path loading for DataFrame yet and likely won't as we move to Arrow.
Add support for StringDtype (available in Pandas >=1.0)
Fixes #1237
AFTER