Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include more operations in database pushdowns #1731

Open
universalmind303 opened this issue Sep 8, 2023 · 1 comment
Open

Include more operations in database pushdowns #1731

universalmind303 opened this issue Sep 8, 2023 · 1 comment
Labels
feat 🎇 New feature or request

Comments

@universalmind303
Copy link
Contributor

Query your data where it lives


Description

I think i recall someone saying the quote above sometime earlier this week & it got me thinking more about pushdowns.

Many of our datasources support complex pushdowns beyond the common predicate & projection pushdowns.

It'd be a really cool feature if we could push down all possible parts of the query to the underlying source, falling back to in-memory when not possible to push down.

Each source may support varying levels of pushdowns, but something like postgres, you should be able to push down the majority of the query.

Nodes with multiple children (join/union) get pretty tricky as we could easily determine if it's of the same kind, but we'd also need to compare to make sure it is the same table provider as well.

Implementation notes

I think we'd need to

  • implement create_physical_expr for our different datasources (sql dialects + mongo).
  • create a physical plan optimization rule that pushes all nodes down as far as possible.
  • some means of comparing the table provider to ensure they are in fact the same datasource.

We could probably start really simple & implement #1507 then follow up with trying to push down something really simple like a UniqueExec or SortExec.

@universalmind303 universalmind303 added the feat 🎇 New feature or request label Sep 8, 2023
@backkem
Copy link

backkem commented Jan 31, 2024

I took a stab at query federation in backkem/datafusion-federation. It's loosely based on the discussion in apache/datafusion#970. I have a first example that cuts the plan into sub-plans and pushes down a join to the remote database: sqlite-partial.

There are many optimizations left to make but it's a start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat 🎇 New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants