Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for polars, arrow RBR as RHS of a join #9571

Closed
gforsyth opened this issue Jul 13, 2024 · 1 comment · Fixed by #9661
Closed

feat: add support for polars, arrow RBR as RHS of a join #9571

gforsyth opened this issue Jul 13, 2024 · 1 comment · Fixed by #9661
Labels
feature Features or general enhancements
Milestone

Comments

@gforsyth
Copy link
Member

We currently support transparent memtable creation from pandas DataFrames and pyarrow Tables if they are provided as the RHS of a join expression.
We should extend that to include other data inputs that are supported by memtable, namely polars DataFrames and arrow RecordBatchReaders

@jcrist
Copy link
Member

jcrist commented Jul 22, 2024

I'd vote to drop support for this instead. For backends that don't have efficient memtables, implicitly creating a memtable multiple times will result in lower performance than calling ibis.memtable once and reusing it. Forcing users to be explicit when coercing other inputs to ibis feels more-correct to me. It's also a bit weird to do this in join methods but not in other table-taking methods like ibis.union.

@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Sep 23, 2024
@github-actions github-actions bot added this to the 10.0 milestone Sep 23, 2024
ncclementi pushed a commit to ncclementi/ibis that referenced this issue Sep 24, 2024
…s-project#9661)

## Description of changes

We have (had) limited support for passing in in-memory objects as the
RHS of a join, where we would create a memtable for the user and then
use that.  For backends where memtable creation is expensive, or for
queries where there may be multiple calls to the same in-memory data, it
is better to be explicit and first register the in-memory data with the
backend using either `memtable` or `create_table`.

BREAKING CHANGE: Passing a `pyarrow.Table` or a `pandas.DataFrame` as
the right-hand-side of a join is no longer supported.

To join against in-memory data, you can pass the in-memory object to
`ibis.memtable` or `con.create_table` and use the resulting table object
instead.


## Issues closed

* Resolves ibis-project#9571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants