-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Consider removing PyArrow as a required DBAPI dependency #2413
Comments
I suppose it's doable enough to change it so that as long as you don't try and read result sets (which needs something to process the Arrow data), we can accept PyCapsule and not require PyArrow. You can always use the low-level interface as well, though it doesn't implement DBAPI. (The proposal here would technically not implement DBAPI either, but I suppose it'd be closer!) |
Technically nanoarrow for Python can do row tuples and possibly a few other things (but I would personally consider it something that should be opted in to and treated experimentally). |
I was thinking about that, but I figure if we can get away with no dependencies at all that might be useful too. |
Probably best would be to eliminate the dependency and give an example that uses |
Thanks both for the prompt responses. For a little more context behind the request (if completed), I am wanting to propose to Polars that they also remove PyArrow as a required dependency for using the ADBC engine in database I/O.
Upon reading the
Sorry for any confusion, I was conflating DBAPI 2.0 (PEP 249) and the
That would be ideal from my (and I dare say other libraries') POV, where some features/APIs require dependencies (e.g., |
I think that's reasonable to add at the same time
I just mean that, because certain methods (like fetchone) wouldn't work, we technically wouldn't be in full compliance, but otherwise we would appear to look and function like a real DBAPI driver (unless you try and fetch Python objects from result sets) |
That would be awesome!
Ah got it, thanks for clarifying |
I know we've gone back and forth on it but maybe its worth having nanoarrow as a dependency to stay compliant with the DBAPI? I also think it would be a good way to promote more usage of that library |
I think we could do something where if you have neither nanoarrow nor pyarrow installed, it will function with limited support, otherwise it can use whichever one is available (not sure what would happen if you have both) |
Perhaps concretely:
|
What feature or improvement would you like to see?
Thanks to the Arrow PyCapsule Interface, one can read data directly into supported libraries (e.g., DuckDB, Polars) without requiring PyArrow. This is great!
Writing on the hand (e.g., via
adbc_ingest
), does require PyArrow. This is a bit of a shame given that thedata
supplied toadbc_ingest
also supports the PyCapsule Interface.Furthermore, removal of PyArrow as a required DBAPI dependency would allow reading data with a higher level API.
These changes would be particularly beneficial to a library like Polars, removing the need for PyArrow completely for database I/O.
NB. I am certainly no expert regarding this, so please correct me if I have said anything incorrect or there are fundamental limitations that make this request unreasonable.
The text was updated successfully, but these errors were encountered: