Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] RecordBatchReader constructor from stream object implementing the PyCapsule Protocol #39217

Closed
Tracked by #39195 ...
jorisvandenbossche opened this issue Dec 13, 2023 · 0 comments · Fixed by #39218

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 13, 2023

In #37797 we added the dunder methods for the Arrow PyCapsule Protocol, and we also already added support for checking for objects that implement the protocol in the pa.array(..), pa.record_batch(..) and pa.schema(..) constructors, such that you can for example create a pyarrow array with pa.array(obj) given any object obj that supports the interface by defining __arrow_c_array__.

But for the stream objects, we don't have an equivalent factory function that creates a RecordBatchReader. Therefore I think it would be good to add a public RecordBatchReader constructor from stream objects implementing the protocol (to avoid you need to call the _import_from_c_capsule private method for this use case). For example RecordBatchReader.from_stream?

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Dec 13, 2023
…r objects implementing the Arrow PyCapsule protocol
jorisvandenbossche added a commit that referenced this issue Jan 8, 2024
…cts implementing the Arrow PyCapsule protocol (#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: #39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche jorisvandenbossche added this to the 15.0.0 milestone Jan 8, 2024
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…r objects implementing the Arrow PyCapsule protocol (apache#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: apache#39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…r objects implementing the Arrow PyCapsule protocol (apache#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: apache#39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…r objects implementing the Arrow PyCapsule protocol (apache#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: apache#39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
ion-elgreco added a commit to delta-io/delta-rs that referenced this issue Jul 18, 2024
…2534)

# Description

Adds support for the [Arrow PyCapsule
interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).

Since pyarrow is already a required dependency, this takes the minimal
route of converting pycapsule interface objects into pyarrow objects.
This requires pyarrow 15 or higher for the stream conversion
(apache/arrow#39217).

This doesn't modify the existing hard-coded support for pyarrow and
pandas

# Related Issue(s)

- closes #2376

# Documentation

---------

Co-authored-by: Ion Koutsouris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment