Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Support Binary/StringView in PyArrow #39633

Open
3 of 4 tasks
jorisvandenbossche opened this issue Jan 16, 2024 · 2 comments
Open
3 of 4 tasks

[Python] Support Binary/StringView in PyArrow #39633

jorisvandenbossche opened this issue Jan 16, 2024 · 2 comments

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 16, 2024

The new Binary and String View format types have been added to C++ (#37792, basic implementation), but not yet exposed to Python.

This is an overview issue of adding support for those to pyarrow:

jorisvandenbossche added a commit that referenced this issue Jan 30, 2024
…es (#39652)

### Rationale for this change

First step for #39633: exposing the Array, DataType and Scalar classes for BinaryView and StringView, such that those can already be represented in pyarrow.

(I exposed a variant of StringBuilder as well, just for now to be able to create test data)

* Closes: #39651

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
jorisvandenbossche added a commit that referenced this issue Feb 7, 2024
…hon objects (#39853)

Next step for Binary/StringView support in Python (#39633), now adding it to the python->arrow conversion code path.
* Closes: #39852

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
… classes (apache#39652)

### Rationale for this change

First step for apache#39633: exposing the Array, DataType and Scalar classes for BinaryView and StringView, such that those can already be represented in pyarrow.

(I exposed a variant of StringBuilder as well, just for now to be able to create test data)

* Closes: apache#39651

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…om python objects (apache#39853)

Next step for Binary/StringView support in Python (apache#39633), now adding it to the python->arrow conversion code path.
* Closes: apache#39852

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
jorisvandenbossche added a commit that referenced this issue Feb 22, 2024
…as (#40093)

Last step for Binary/StringView support in Python (#39633), now adding it to the arrow->pandas/numpy conversion code path.
* Closes: #40092

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
… classes (apache#39652)

### Rationale for this change

First step for apache#39633: exposing the Array, DataType and Scalar classes for BinaryView and StringView, such that those can already be represented in pyarrow.

(I exposed a variant of StringBuilder as well, just for now to be able to create test data)

* Closes: apache#39651

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…om python objects (apache#39853)

Next step for Binary/StringView support in Python (apache#39633), now adding it to the python->arrow conversion code path.
* Closes: apache#39852

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…y/pandas (apache#40093)

Last step for Binary/StringView support in Python (apache#39633), now adding it to the arrow->pandas/numpy conversion code path.
* Closes: apache#40092

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
… classes (apache#39652)

### Rationale for this change

First step for apache#39633: exposing the Array, DataType and Scalar classes for BinaryView and StringView, such that those can already be represented in pyarrow.

(I exposed a variant of StringBuilder as well, just for now to be able to create test data)

* Closes: apache#39651

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…om python objects (apache#39853)

Next step for Binary/StringView support in Python (apache#39633), now adding it to the python->arrow conversion code path.
* Closes: apache#39852

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…y/pandas (apache#40093)

Last step for Binary/StringView support in Python (apache#39633), now adding it to the arrow->pandas/numpy conversion code path.
* Closes: apache#40092

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
@a10y
Copy link

a10y commented Oct 22, 2024

@jorisvandenbossche There seem to be a long tail of compute functions not currently supported, namely

  • casting between String and StringView
  • comparison operations
  • Scalar generation

Are there separate issues tracking those or is this the one? If this is the one, I'm curious the priority of addressing those

@jorisvandenbossche
Copy link
Member Author

Yes, indeed, in general the string view type is not yet widely supported

casting between String and StringView

That should be working now with the latest 18.0 release

Are there separate issues tracking those or is this the one?

Most of those issues will have to be fixed / implemented on the C++ side. One such issue about adding more functionality is #39634

Scalar generation

What do you mean exactly with this item?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants