-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype: avoid surfacing pyarrow exception in binary operations #59610
String dtype: avoid surfacing pyarrow exception in binary operations #59610
Conversation
pandas/core/arrays/arrow/array.py
Outdated
except pa.lib.ArrowNotImplementedError as err: | ||
raise TypeError(self._op_method_error_message(other_original, op)) from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing from err
here, so the original pyarrow message like "ArrowNotImplementedError: Function 'binary_join_element_wise' has no kernel matching input types (large_string, int64, large_string)" is still visible up in the traceback.
For the new string dtype, I would actually be fine with suppressing this (using from None
) to limit the length of the error traceback. But for usage of ArrowDtype in general, I suppose this information is still very useful to see.
But as I mentioned in the top post, I could also only do this try/except in case we are in ArrowStringArray, and leave the generic ArrowExtensionArray as is (raising the pyarrow error directly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the from err
- I don't see a huge harm in including all of that information
OK, this should be fully ready for review now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change - great work!
pytest.mark.xfail( | ||
raises=TypeError, reason="Can only string multiply by an integer." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to see these go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks @jorisvandenbossche |
For the pyarrow storage, in the generic ArrowExtensionArray implementation, we currently just call the pyarrow compute kernel, regardless of the input types are supported or not, potentially resulting in pyarrow errors surfacing to the user (typically
pyarrow.lib.ArrowNotImplementedError
).This has several issues: 1) it's a bit annoying for testing (the tests currently import pyarrow to test the expected error message, which then fails if pyarrow is not installed), but 2) moreover this is a bad user experience IMO: it gives error messages that can be confusing (including "implementation details" from pyarrow) and inconsistent (compared to non-pyarrow backed dtypes), and it actually also gives a wrong exception type (as many of those operations are simply not supported for strings, and are expected to raise a TypeError, instead of a NotIplementedError which gives the impression it might be implemented in the future).
I am currently doing this for all arrow types, but I could also limit this to only those cases where it is used by the ArrowStringArray, and not for other ArrowDtype cases.
xref #54792