You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all - thanks everybody for this repo! The federation support for DataFusion insanely relevant to me, and I was thinking about building similar thing, until I found this project. I have a few questions regarding the performance of certain situations, which are mostly relevant for me.
The first one is - how performant is the join of two remote tables, and how is it works? Are you doing smth like querying join eq operands, doing hash join in memory and fetching relevant tables? (for example, a join of two different PostgreSQL tables from different servers, a.k.a FDW). Or there's no optimisations yet in this regard?
Also, I was curious about the final goal of the project - would it be merged into mainstream of DataFusion repo, or it is expected to be in different repo and crate? Thanks in advance
The text was updated successfully, but these errors were encountered:
The first one is - how performant is the join of two remote tables, and how is it works? Are you doing smth like querying join eq operands, doing hash join in memory and fetching relevant tables? (for example, a join of two different PostgreSQL tables from different servers, a.k.a FDW). Or there's no optimisations yet in this regard?
Federated joins are not optimized yet, see #23 for some discussion of this. Most queries will see a full table scan and datafusion will perform the join locally. It would be an awesome improvement to push down more work to the federated table providers.
Also, I was curious about the final goal of the project - would it be merged into mainstream of DataFusion repo, or it is expected to be in different repo and crate? Thanks in advance
If datafusion-federation as a whole matures and proves itself useful to a significant portion of the overall user-base, then yes it could be merged into the upstream repo. We could also continue pushing upstream small bits of functionality over time. We have actually already done this for the Plan->SQL code apache/datafusion#9494 .
Hi there! I stumbled upon this project from this discussion: apache/datafusion#970
First of all - thanks everybody for this repo! The federation support for DataFusion insanely relevant to me, and I was thinking about building similar thing, until I found this project. I have a few questions regarding the performance of certain situations, which are mostly relevant for me.
The first one is - how performant is the join of two remote tables, and how is it works? Are you doing smth like querying join eq operands, doing hash join in memory and fetching relevant tables? (for example, a join of two different PostgreSQL tables from different servers, a.k.a FDW). Or there's no optimisations yet in this regard?
Also, I was curious about the final goal of the project - would it be merged into mainstream of DataFusion repo, or it is expected to be in different repo and crate? Thanks in advance
The text was updated successfully, but these errors were encountered: