Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pyarrow datasets in future versions #10225

Closed
1 task done
szst11 opened this issue Sep 25, 2024 · 1 comment · Fixed by #10206
Closed
1 task done

feat: pyarrow datasets in future versions #10225

szst11 opened this issue Sep 25, 2024 · 1 comment · Fixed by #10206
Labels
duckdb The DuckDB backend feature Features or general enhancements
Milestone

Comments

@szst11
Copy link

szst11 commented Sep 25, 2024

Is your feature request related to a problem?

I got the message

FutureWarning: `Backend.register` is deprecated as of v9.1; use the explicit `read_*` method for the filetype you are trying to read, e.g., read_parquet, read_csv, etc.

and

FutureWarning: `Backend.read_in_memory` is deprecated as of v9.1, removed in v10.0; Pass in-memory data to `create_table` instead.

when I try to register a pyarrow Dataset

What is the motivation behind your request?

The usage of a pyarrow Dataset enables the usage of many parquet files, which are indexed only once at creation of the dataset.
For now the .register() works for that.

If I read the parquet files directly with DuckDB they are indexed on every real request.

The other option create_table is also not optimal, as it loads the dataset into the database, but I would keep the data within the parquet file and keep the pushdown filtering.

Describe the solution you'd like

I'd like to be able to still access a pyarrow dataset without copying the data into memory.

What version of ibis are you running?

<10

What backend(s) are you using, if any?

DuckDB

Code of Conduct

  • I agree to follow this project's Code of Conduct
@szst11 szst11 added the feature Features or general enhancements label Sep 25, 2024
@gforsyth
Copy link
Member

Hey @szst11 -- I'm working on that in #10206 -- you'll be able to pass the dataset to ibis.memtable and then use that in DuckDB without materializing the data and with pushdowns working.

@gforsyth gforsyth added the duckdb The DuckDB backend label Sep 25, 2024
@gforsyth gforsyth linked a pull request Sep 25, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Sep 26, 2024
@github-actions github-actions bot added this to the 10.0 milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duckdb The DuckDB backend feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants