You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related to pandas-dev/pandas#52434. Currently utf8_slice_codeunits doesn't support a default start value. It would be nice to support default of 0 if step > 1 and len -1 if step < 0 for parity with pandas.
Component(s)
Python
The text was updated successfully, but these errors were encountered:
jorisvandenbossche
changed the title
Allow utf8_slice_codeunits to support default start value of None to support strings of different length
[C++][Python] Allow utf8_slice_codeunits to support default start value of None to support strings of different length
Apr 6, 2023
@rohanjain101 thanks for the report! I was going to mention that you could use sys.maxsize as start (the largest integer, will always be beyond the end of a single string in the input, and so will always start slicing from the end), but apparently you can easily run into a segfault with that: opened #34928
As a non-ideal workaround, you could check what is the largest string in your input array with pa.compute.max(pa.compute.utf8_length(arr)) (or just take a reasonable large value, but not close to sys.maxsize), and use that as start value:
In [1]: pa.compute.utf8_slice_codeunits("abcdefghijklmnabcdefghijkln", start=1000, stop=8, step=-9)
Out[1]: <pyarrow.StringScalar: 'nd'>
Describe the enhancement requested
Related to pandas-dev/pandas#52434. Currently utf8_slice_codeunits doesn't support a default start value. It would be nice to support default of 0 if step > 1 and len -1 if step < 0 for parity with pandas.
Component(s)
Python
The text was updated successfully, but these errors were encountered: