-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python/adbc_driver_manager: use PyCapsule for handles to C structs #70
Comments
Yes, IIRC the official R-Python interop library supports shipping PyCapsule to R (cc @paleolimbot ). |
Ah, that doesn't sound helpful actually, because reticulate is only considering the case where the "context" of the capsule (an embedded opaque pointer) was created by R. Here, the case AFAIU would be to pass a Python-created PyCapsule to R. Casting a Python-managed opaque pointer to a R SEXP is probably a bad idea (though who knows :-)). |
I mean, the code reference you gave was very helpful :-) |
I see..that's the other direction (R to Python). I assumed that would roundtrip but apparently it doesn't ( https://github.com/rstudio/reticulate/blob/74d139b1772d29dce24b22a828f2972ac97abacf/src/python.cpp#L755-L756 ).
I think in this case that's exactly what we want! It's easy to work around, though as long as we can get an address (similar to what the Arrow R package does except in this case the R external pointer will keep a Python object reference). |
Well, |
Nothing! reticulate doesn't handle them. In this case the R package would implement |
Going to punt this since I think if we start returning PyCapsule, you won't be able to work with that unless you also have PyArrow 14, and I don't want to bring up the minimum PyArrow version so much right now |
I'm not sure I understand the relationship with PyArrow here. What are the PyCapsules in this issue supposed to convey? |
We return custom wrappers around C Data Interface objects, we want to return compliant PyCapsules (and Joris already prototyped that), but first I also want some way to make the PyCapsule also work with versions of PyArrow that can't import them (since they're opaque to Python code, right?) |
Indeed, they are, except using |
I think the idiom to apply would be to implement |
But we have other methods (like get_table_schema) that need to return capsules. |
I think there it would return an object that implements pyarrow.schema(conn.get_table_schema()) |
That still requires you to have PyArrow 14, which is the main thing I wanted to avoid |
I guess that means I'd keep the existing objects we have (that expose the raw address), and just have them implement the new dunder methods in addition |
Well, given the PyArrow CVE I think the next release of ADBC will have to require 14.0.1 as a baseline. |
There's the hotfix as well if you won't want to require 14.0.1. |
Right - but I think it behooves another Arrow project to follow our own advice 🙂 |
Started looking at this again, exploring how we can adopt the capsule protocol in ADBC python. The low-level interface ( I think in theory we could replace those current Handle classes with PyCapsules directly (and in the higher-level code, we can still extract the pointer from the PyCapsule when having to deal with a pyarrow version that doesn't support capsules directly).
One advantage of keeping some generic wrapper/handle class that has the dunders vs returning pycapsules directly, is that the return values of the low-level interface can then more easily be passed to a library that expects an object with the dunder defined instead of the capsules directly (i.e. how we currently implemented support for this in pyarrow, e.g. The DBAPI is much more tied to pyarrow, so I don't know if, on the short term, we want to enable getting arrow data without a dependency on Just getting an overview for myself:
Changing the methods that currently return a pyarrow Schema/Table/RecordBatchReader to return a generic object implementing the dunders seems too much of a breaking change (and would also be much less convenient for pyarrow users, e.g. requiring a users to do Of course, assuming you have pyarrow 14+ installed, by returning pyarrow objects which implement the dunders, we of course also automatically make the capsule protocol available in ADBC. So it's more a question to what extent we want to make it possible to use the DBAPI layer without pyarrow. So in summary, short term I think I would do:
And then longer term we can think about whether we also want to enable using the DBAPI in some form without a runtime dependency on pyarrow. |
+1 |
Thanks! Yup, this is my plan (I started working on it over holiday but realized it was...holiday) |
… Capsule interface (#1346) Addresses #70 This PR adds the dunder methods to the Handle classes of the low-level interface (which already enables using the low-level interface without pyarrow and with the capsule protocol). And secondly, in the places that accept data (eg ingest/bind), it now also accepts objects that implement the dunders in addition to hardcoded support for pyarrow. --------- Co-authored-by: David Li <[email protected]>
Completed in #1346 |
We should try to use the 'native' type of the C API. Apparently, this will also ease interoperability with R.
The text was updated successfully, but these errors were encountered: