String dtype: avoid surfacing pyarrow exception in binary operations #59610

jorisvandenbossche · 2024-08-26T13:32:15Z

For the pyarrow storage, in the generic ArrowExtensionArray implementation, we currently just call the pyarrow compute kernel, regardless of the input types are supported or not, potentially resulting in pyarrow errors surfacing to the user (typically pyarrow.lib.ArrowNotImplementedError).

This has several issues: 1) it's a bit annoying for testing (the tests currently import pyarrow to test the expected error message, which then fails if pyarrow is not installed), but 2) moreover this is a bad user experience IMO: it gives error messages that can be confusing (including "implementation details" from pyarrow) and inconsistent (compared to non-pyarrow backed dtypes), and it actually also gives a wrong exception type (as many of those operations are simply not supported for strings, and are expected to raise a TypeError, instead of a NotIplementedError which gives the impression it might be implemented in the future).

I am currently doing this for all arrow types, but I could also limit this to only those cases where it is used by the ArrowStringArray, and not for other ArrowDtype cases.

xref #54792

jorisvandenbossche · 2024-08-26T13:35:36Z

pandas/core/arrays/arrow/array.py

+        except pa.lib.ArrowNotImplementedError as err:
+            raise TypeError(self._op_method_error_message(other_original, op)) from err


I am doing from err here, so the original pyarrow message like "ArrowNotImplementedError: Function 'binary_join_element_wise' has no kernel matching input types (large_string, int64, large_string)" is still visible up in the traceback.

For the new string dtype, I would actually be fine with suppressing this (using from None) to limit the length of the error traceback. But for usage of ArrowDtype in general, I suppose this information is still very useful to see.

But as I mentioned in the top post, I could also only do this try/except in case we are in ArrowStringArray, and leave the generic ArrowExtensionArray as is (raising the pyarrow error directly)

I like the from err - I don't see a huge harm in including all of that information

…ions-error

jorisvandenbossche · 2024-08-27T10:22:30Z

OK, this should be fully ready for review now

WillAyd

This is a nice change - great work!

jbrockmendel · 2024-08-27T15:48:42Z

pandas/tests/extension/test_arrow.py

-                pytest.mark.xfail(
-                    raises=TypeError, reason="Can only string multiply by an integer."
-                )
-            )


nice to see these go!

jbrockmendel

LGTM

WillAyd · 2024-08-27T15:51:53Z

thanks @jorisvandenbossche

…andas-dev#59610)

…59610)

String dtype: avoid surfacing pyarrow excetion in binary operations

ae69206

jorisvandenbossche added Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data Arrow pyarrow functionality labels Aug 26, 2024

jorisvandenbossche requested review from WillAyd, jbrockmendel and mroeschke August 26, 2024 13:32

jorisvandenbossche commented Aug 26, 2024

View reviewed changes

jorisvandenbossche added 7 commits August 26, 2024 20:30

fix expected exception in test_arrow.py

ff1f57e

fix test_sub_fail for infer_string

227b42a

fix expected exception in test_string.py

2426cc5

try fix type annotations

34a970a

try fix type annotations

649dad1

fix failures

946abca

clean-up more tests

ea4163d

jorisvandenbossche added this to the 2.3 milestone Aug 27, 2024

try fixing type annotations

2e9b0d1

jorisvandenbossche changed the title ~~String dtype: avoid surfacing pyarrow excetion in binary operations~~ String dtype: avoid surfacing pyarrow exception in binary operations Aug 27, 2024

jorisvandenbossche added 3 commits August 27, 2024 10:49

try fixing type annotations

44c0090

Merge remote-tracking branch 'upstream/main' into string-dtype-operat…

cda40ee

…ions-error

fixup

843671c

WillAyd approved these changes Aug 27, 2024

View reviewed changes

jbrockmendel reviewed Aug 27, 2024

View reviewed changes

jbrockmendel approved these changes Aug 27, 2024

View reviewed changes

WillAyd merged commit 67bec1f into pandas-dev:main Aug 27, 2024
47 checks passed

jorisvandenbossche deleted the string-dtype-operations-error branch August 27, 2024 16:07

jorisvandenbossche mentioned this pull request Aug 27, 2024

TRACKER: new default String dtype (pyarrow-backed, numpy NaN semantics) #54792

Open

41 tasks

jorisvandenbossche added the backported label Oct 10, 2024

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Oct 10, 2024

String dtype: avoid surfacing pyarrow exception in binary operations (p…

8386795

…andas-dev#59610)

jorisvandenbossche added a commit that referenced this pull request Oct 10, 2024

String dtype: avoid surfacing pyarrow exception in binary operations (#…

daa46c1

…59610)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String dtype: avoid surfacing pyarrow exception in binary operations #59610

String dtype: avoid surfacing pyarrow exception in binary operations #59610

jorisvandenbossche commented Aug 26, 2024

jorisvandenbossche Aug 26, 2024

WillAyd Aug 26, 2024

jorisvandenbossche commented Aug 27, 2024

WillAyd left a comment

jbrockmendel Aug 27, 2024

jbrockmendel left a comment

WillAyd commented Aug 27, 2024

		except pa.lib.ArrowNotImplementedError as err:
		raise TypeError(self._op_method_error_message(other_original, op)) from err

String dtype: avoid surfacing pyarrow exception in binary operations #59610

String dtype: avoid surfacing pyarrow exception in binary operations #59610

Conversation

jorisvandenbossche commented Aug 26, 2024

jorisvandenbossche Aug 26, 2024

Choose a reason for hiding this comment

WillAyd Aug 26, 2024

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 27, 2024

WillAyd left a comment

Choose a reason for hiding this comment

jbrockmendel Aug 27, 2024

Choose a reason for hiding this comment

jbrockmendel left a comment

Choose a reason for hiding this comment

WillAyd commented Aug 27, 2024