-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add to_pandas_batches #6805
Comments
@thegreymatter Nice to see you again, and thanks for the issue!
Yeah, this is a bit of a pickle. The issue is that pandas is more flexible in this case because it can hold arbitrary Python objects in a column. Turns out this is useful in the case of snowflake complex types since those types are effectively untyped JSON. PyArrow is more efficient across the board, but lacks an in-memory representation that is more specific than a string type for snowflake's complex types.
|
I'm not opposed. The fallback implementation for pyarrow batches should work for this case, too, although I think we should decouple pandas batching from pyarrow so we can still provide the more "flexible" typing |
Are there other limitations of just having to_pyarrow, or is that the only one? If that is the only one, then I would vote for keeping things simple and just supporting .to_pyarrow(). I want to see pyarrow being the lingua franca, the "hub of the conversion wheel", where every other implementation can convert to/from that, and then the conversion path is always |
I think this makes sense to do. While it is additional code to maintain, I'm not sure it further entrenches pandas anywhere or reduces the need/desire to Arrow-ize all the things. The implementation might even be as simple as: yield from (
PandasSchema.convert_table(batch.to_pandas(), expr.schema())
for batch in expr.to_pyarrow_batches()
) |
OK, sounds good! It will definitely be appreciated and used by people. Does that implementation still suffer from the drawback of snowflake complex types, since you aren't you still are going snowflake -> pyarrow -> pandas? |
No, because the |
Is your feature request related to a problem?
Today I'm lacking the ability to get pandas batches easily from an ibis expression
I can do it manually by employing to_pyarrow_batches(), and then to_pandas on each batch.
there are several issues with that approach
Describe the solution you'd like
add to_pandas_batches() function on table expression.
by default it could be implemented using the pyarrowbatches->pandas batches approach , and if a specific backend has a better impl we can use it.
What version of ibis are you running?
6.0.0
What backend(s) are you using, if any?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: