-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement optimization rule for broadcast/CollectLeft hash join #28
Comments
alamb
pushed a commit
that referenced
this issue
Dec 26, 2022
* Sort Removal rule initial commit * move ordering satisfy to the util * update test and change repartition maintain_input_order impl * simplifications * partition by refactor (#28) * partition by refactor * minor changes * Unnecessary tuple to Range conversion is removed * move transpose under common * Add naive sort removal rule * Add todo for finer Sort removal handling * Refactors to improve readability and reduce nesting * reverse expr returns Option (no need for support check) * fix tests * partition by and order by no longer ends up at the same window group * Refactor to simplify code * Better comments, change method names * Resolve errors introduced by syncing * address reviews * address reviews * Rename to less confusing OptimizeSorts Co-authored-by: Mehmet Ozan Kabak <[email protected]>
alamb
pushed a commit
to alamb/datafusion
that referenced
this issue
Jan 4, 2023
* partition by refactor * minor changes * Unnecessary tuple to Range conversion is removed * move transpose under common
alamb
pushed a commit
that referenced
this issue
Jan 4, 2023
* Sort Removal rule initial commit * move ordering satisfy to the util * update test and change repartition maintain_input_order impl * simplifications * partition by refactor (#28) * partition by refactor * minor changes * Unnecessary tuple to Range conversion is removed * move transpose under common * Add naive sort removal rule * Add todo for finer Sort removal handling * Refactors to improve readability and reduce nesting * reverse expr returns Option (no need for support check) * fix tests * partition by and order by no longer ends up at the same window group * Bounded window exec * solve merge problems * Refactor to simplify code * Better comments, change method names * resolve merge conflicts * Resolve errors introduced by syncing * remove set_state, make ntile debuggable * remove locked flag * address reviews * address reviews * Resolve merge conflict * address reviews * address reviews * address reviews * Add new tests * Update tests * add support for bounded min max * address reviews * rename sort rule * Resolve merge conflicts * refactors * Update fuzzy tests + minor changes * Simplify code and improve comments * Fix imports, make create_schema more functional * address reviews * undo yml change * minor change to pass from CI * resolve merge conflicts * rename some members * Move rule to physical planning * Minor stylistic/comment changes * Simplify batch-merging utility functions * Remove unnecessary clones, simplify code * update cargo lock file * address reviews * update comments * resolve linter error * Tidy up comments after final review Co-authored-by: Mehmet Ozan Kabak <[email protected]>
andygrove
added a commit
that referenced
this issue
Jan 12, 2023
* Initial commit * initial commit * failing test * table scan projection * closer * test passes, with some hacks * use DataFrame (#2) * update README * update dependency * code cleanup (#3) * Add support for Filter operator and BinaryOp expressions (#4) * GitHub action (#5) * Split code into producer and consumer modules (#6) * Support more functions and scalar types (#7) * Use substrait 0.1 and datafusion 8.0 (#8) * use substrait 0.1 * use datafusion 8.0 * update datafusion to 10.0 and substrait to 0.2 (#11) * Add basic join support (#12) * Added fetch support (#23) Added fetch to consumer Added limit to producer Added unit tests for limit Added roundtrip_fill_none() for testing when None input can be converted to 0 Update src/consumer.rs Co-authored-by: Andy Grove <[email protected]> Co-authored-by: Andy Grove <[email protected]> * Upgrade to DataFusion 13.0.0 (#25) * Add sort consumer and producer (#24) Add consumer Add producer and test Modified error string * Add serializer/deserializer (#26) * Add plan and function extension support (#27) * Add plan and function extension support * Removed unwraps * Implement GROUP BY (#28) * Add consumer, producer and tests for aggregate relation Change function extension registration from absolute to relative anchor (reference) Remove operator to/from reference * Fixed function registration bug * Add test * Addressed PR comments * Changed field reference from mask to direct reference (#29) * Changed field reference from masked reference to direct reference * Handle unsupported case (struct with child) * Handle SubqueryAlias (#30) Fixed aggregate function register bug * Add support for SELECT DISTINCT (#31) Add test case * Implement BETWEEN (#32) * Add case (#33) * Implement CASE WHEN * Add more case to test * Addressed comments * feat: support explicit catalog/schema names in ReadRel (#34) * feat: support explicit catalog/schema names in ReadRel Signed-off-by: Ruihang Xia <[email protected]> * fix: use re-exported expr crate Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * move files to subfolder * RAT * remove rust.yaml * revert .gitignore changes * tomlfmt * tomlfmt Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Daniël Heres <[email protected]> Co-authored-by: JanKaul <[email protected]> Co-authored-by: nseekhao <[email protected]> Co-authored-by: Ruihang Xia <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When the left side of the join is very small (compared to the right side) it is better to load the left side once (and broadcast it in Ballista).
In DF this avoids hash partitioning, in Ballista this also avoids data shuffling.
The text was updated successfully, but these errors were encountered: