Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort Merge Join #2

Closed
wants to merge 15 commits into from
Closed

Sort Merge Join #2

wants to merge 15 commits into from

Conversation

yjshen
Copy link
Owner

@yjshen yjshen commented Nov 4, 2021

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@yjshen yjshen changed the title wip: sort-merge-join Sort Merge Join Nov 10, 2021
@yjshen yjshen closed this Nov 22, 2021
yjshen pushed a commit that referenced this pull request Dec 20, 2021
* # This is a combination of 3 commits.
# This is the 1st commit message:

Add Display for Expr::BinaryExpr

# This is the commit message #2:

Update logical_plan/operators tests

# This is the commit message #3:

rebase and debug display for non binary expr

* Add Display for Expr::BinaryExpr

Update logical_plan/operators tests

rebase and debug display for non binary expr

Add Display for Expr::BinaryExpr

Update logical_plan/operators tests

Updating tests

Update aggregate display

Updating tests without aggregate

More tests

Working on agg/scalar functions

Fix binary_expr in create_name function and attendant tests

More tests

More tests

Doc tests

Rebase and update new tests

* Submodule update

* Restore submodule references from master

Co-authored-by: Andrew Lamb <[email protected]>
yjshen pushed a commit that referenced this pull request Dec 20, 2021
* [feat] stubs for provider re-organization

* [feat] implement infer_schema to make test pass

* [wip] trying to implement pruned_partition_list

* [typo]

* [fix] replace enum with trait for extensibility

* [fix] add partition cols to infered schema

* [feat] forked file format executors

avro still missing

* [doc] comments about why we are flattening

* [test] migrated tests to file formats

* [test] improve listing test

* [feat] add avro to refactored format providers

* [fix] remove try from new when unnecessary

* [fix] remove try_ from ListingTable new

* [refacto] renamed format module to file_format

also removed statistics from the PartitionedFile abstraction

* [fix] removed Ballista stubs

* [fix] rename create_executor

* [feat] added store

* [fix] Clippy

* [test] improve file_format tests with limit

* [fix] limit file system read size

* [fix] avoid fetching unnecessary stats after limit

* [fix] improve readability

* [doc] improve comments

* [refacto] keep async reader stub

* [doc] cleanup comments

* [test] test file listing

* [fix] add last_modified back

* [refacto] simplify csv reader exec

* [refacto] change SizedFile back to FileMeta

* [doc] comment clarification

* [fix] avoid keeping object store as field

* [refacto] grouped params to avoid too_many_arguments

* [fix] get_by_uri also returns path

* [fix] ListingTable at store level instead of registry

* [fix] builder take self and not ref to self

* Replace file format providers (#2)

* [fix] replace file format providers in datafusion

* [lint] clippy

* [fix] replace file format providers in ballista

* [fix] await in python wrapper

* [doc] clearer doc about why sql() is async

* [doc] typos and clarity

* [fix] missing await after rebase
@yjshen yjshen deleted the mm_sort_join branch April 22, 2022 08:32
yjshen pushed a commit that referenced this pull request Mar 16, 2023
* Optimize `regex_replace` for scalar patterns

* Change the hot-path on `regexp_replace` to only variadic source (#2)
yjshen pushed a commit that referenced this pull request Mar 16, 2023
* Initial commit

* initial commit

* failing test

* table scan projection

* closer

* test passes, with some hacks

* use DataFrame (#2)

* update README

* update dependency

* code cleanup (#3)

* Add support for Filter operator and BinaryOp expressions (#4)

* GitHub action (#5)

* Split code into producer and consumer modules (#6)

* Support more functions and scalar types (#7)

* Use substrait 0.1 and datafusion 8.0 (#8)

* use substrait 0.1

* use datafusion 8.0

* update datafusion to 10.0 and substrait to 0.2 (#11)

* Add basic join support (#12)

* Added fetch support (#23)

Added fetch to consumer

Added limit to producer

Added unit tests for limit

Added roundtrip_fill_none() for testing when None input can be converted to 0

Update src/consumer.rs

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* Upgrade to DataFusion 13.0.0 (#25)

* Add sort consumer and producer (#24)

Add consumer

Add producer and test

Modified error string

* Add serializer/deserializer (#26)

* Add plan and function extension support (#27)

* Add plan and function extension support

* Removed unwraps

* Implement GROUP BY (#28)

* Add consumer, producer and tests for aggregate relation

Change function extension registration from absolute to relative anchor
(reference)

Remove operator to/from reference

* Fixed function registration bug

* Add test

* Addressed PR comments

* Changed field reference from mask to direct reference (#29)

* Changed field reference from masked reference to direct reference

* Handle unsupported case (struct with child)

* Handle SubqueryAlias (#30)

Fixed aggregate function register bug

* Add support for SELECT DISTINCT (#31)

Add test case

* Implement BETWEEN (#32)

* Add case (#33)

* Implement CASE WHEN

* Add more case to test

* Addressed comments

* feat: support explicit catalog/schema names in ReadRel (#34)

* feat: support explicit catalog/schema names in ReadRel

Signed-off-by: Ruihang Xia <[email protected]>

* fix: use re-exported expr crate

Signed-off-by: Ruihang Xia <[email protected]>

Signed-off-by: Ruihang Xia <[email protected]>

* move files to subfolder

* RAT

* remove rust.yaml

* revert .gitignore changes

* tomlfmt

* tomlfmt

Signed-off-by: Ruihang Xia <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: JanKaul <[email protected]>
Co-authored-by: nseekhao <[email protected]>
Co-authored-by: Ruihang Xia <[email protected]>
yjshen pushed a commit that referenced this pull request Mar 31, 2024
* refactor `TreeNode::rewrite()`

* use handle_tree_recursion in `Expr`

* use macro for transform recursions

* fix api

* minor fixes

* fix

* don't trust `t.transformed` coming from transformation closures, keep the old way of detecting if changes were made

* rephrase todo comment, always propagate up `t.transformed` from the transformation closure, fix projection pushdown closure

* Fix `TreeNodeRecursion` docs

* extend Skip (Prune) functionality to Jump as it is defined in https://synnada.notion.site/synnada/TreeNode-Design-Proposal-bceac27d18504a2085145550e267c4c1

* fix Jump and add tests

* jump test fixes

* fix clippy

* unify "transform" traversals using macros, fix "visit" traversal jumps, add visit jump tests, ensure consistent naming `f` instead of `op`, `f_down` instead of `pre_visit` and `f_up` instead of `post_visit`

* fix macro rewrite

* minor fixes

* minor fix

* refactor tests

* add transform tests

* add apply, transform_down and transform_up tests

* refactor tests

* test jump on both a and e nodes in both top-down and bottom-up traversals

* better transform/rewrite tests

* minor fix

* simplify tests

* add stop tests, reorganize tests

* fix previous merges and remove leftover file

* Review TreeNode Refactor (#1)

* Minor changes

* Jump doesn't ignore f_up

* update test

* Update rewriter

* LogicalPlan visit update and propagate from children flags

* Update tree_node.rs

* Update map_children's

---------

Co-authored-by: Mustafa Akur <[email protected]>

* fix

* minor fixes

* fix f_up call when f_down returns jump

* simplify code

* minor fix

* revert unnecessary changes

* fix `DynTreeNode` and `ConcreteTreeNode` `transformed` and `tnr` propagation

* introduce TransformedResult helper

* fix docs

* restore transform as alias to trassform_up

* restore transform as alias to trassform_up 2

* Simplifications and comment improvements (#2)

---------

Co-authored-by: Berkay Şahin <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant