Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor]: Convert Vec<PhysicalExpr> to HashSet<PhysicalExpr> #13612

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

akurmustafa
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Vec<Arc<dynPhysicalExpr>>s inside EquivalenceClass is conceptually set. Previously, since Arc<dyn PhysicalExpr> didnot support Eq, we couldn't use it insideHashSet (or IndexSet). With recent changes in Datafusion, we can do so now. This PR converts Vec<Arc<dynPhysicalExpr>> into IndexSet<Arc<dynPhysicalExpr>> to make implementation simpler.

What changes are included in this PR?

Are these changes tested?

Existing tests

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions common Related to common crate labels Dec 1, 2024
@github-actions github-actions bot removed the common Related to common crate label Dec 1, 2024
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! Thanks @akurmustafa 👍

@@ -191,46 +190,48 @@ pub struct EquivalenceClass {
/// matter for equivalence purposes
///
/// TODO: use a HashSet for this instead of a Vec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, missed it

datafusion/physical-expr/src/equivalence/class.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome -- thanks @akurmustafa and @Weijun-H -- I am also running some benchmarks to see if it improves performance for planning

Self { exprs }
pub fn new(exprs: Vec<Arc<dyn PhysicalExpr>>) -> Self {
Self {
exprs: exprs.into_iter().collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@Weijun-H Weijun-H merged commit fdb221f into apache:main Dec 3, 2024
25 checks passed
@Weijun-H
Copy link
Member

Weijun-H commented Dec 3, 2024

Thanks @akurmustafa and @alamb

@alamb
Copy link
Contributor

alamb commented Dec 3, 2024

Hmm, I seem to see a slight slow down in planning 🤔 maybe worth some profiling

++ critcmp main refactor_hashset_exprs
group                                         main                                   refactor_hashset_exprs
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1471.3±32.58µs        ? ?/sec    1.01  1484.4±20.25µs        ? ?/sec
logical_select_all_from_1000                  1.00      5.2±0.02ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00  1177.1±16.70µs        ? ?/sec    1.00  1172.2±16.21µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1137.3±16.03µs        ? ?/sec    1.01  1144.0±13.67µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00  1122.4±16.65µs        ? ?/sec    1.02  1144.8±25.55µs        ? ?/sec
physical_intersection                         1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_join_consider_sort                   1.00      3.2±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_join_distinct                        1.01  1122.7±16.06µs        ? ?/sec    1.00  1117.1±17.36µs        ? ?/sec
physical_many_self_joins                      1.00     17.0±0.09ms        ? ?/sec    1.00     17.0±0.11ms        ? ?/sec
physical_plan_clickbench_all                  1.00    225.4±1.82ms        ? ?/sec    1.01    227.7±1.99ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      3.3±0.04ms        ? ?/sec    1.01      3.3±0.05ms        ? ?/sec
physical_plan_clickbench_q10                  1.00      4.3±0.05ms        ? ?/sec    1.01      4.4±0.06ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      4.5±0.06ms        ? ?/sec    1.01      4.5±0.05ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.06ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      4.2±0.05ms        ? ?/sec    1.01      4.2±0.06ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      4.4±0.07ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      4.3±0.06ms        ? ?/sec    1.03      4.4±0.17ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.7±0.04ms        ? ?/sec    1.01      3.8±0.04ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      3.8±0.06ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q18                  1.00      3.6±0.04ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      4.5±0.05ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q2                   1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      3.3±0.05ms        ? ?/sec    1.01      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.07ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      5.0±0.06ms        ? ?/sec    1.01      5.0±0.06ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.8±0.09ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      3.9±0.05ms        ? ?/sec    1.01      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q26                  1.00      3.6±0.06ms        ? ?/sec    1.02      3.7±0.05ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      4.0±0.06ms        ? ?/sec    1.01      4.0±0.05ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      4.7±0.07ms        ? ?/sec    1.02      4.8±0.11ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.9±0.10ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      3.5±0.05ms        ? ?/sec    1.02      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q30                  1.00     16.6±0.21ms        ? ?/sec    1.00     16.6±0.18ms        ? ?/sec
...

@Weijun-H
Copy link
Member

Weijun-H commented Dec 4, 2024

Hmm, I seem to see a slight slow down in planning 🤔 maybe worth some profiling

++ critcmp main refactor_hashset_exprs
group                                         main                                   refactor_hashset_exprs
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1471.3±32.58µs        ? ?/sec    1.01  1484.4±20.25µs        ? ?/sec
logical_select_all_from_1000                  1.00      5.2±0.02ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00  1177.1±16.70µs        ? ?/sec    1.00  1172.2±16.21µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1137.3±16.03µs        ? ?/sec    1.01  1144.0±13.67µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00  1122.4±16.65µs        ? ?/sec    1.02  1144.8±25.55µs        ? ?/sec
physical_intersection                         1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_join_consider_sort                   1.00      3.2±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_join_distinct                        1.01  1122.7±16.06µs        ? ?/sec    1.00  1117.1±17.36µs        ? ?/sec
physical_many_self_joins                      1.00     17.0±0.09ms        ? ?/sec    1.00     17.0±0.11ms        ? ?/sec
physical_plan_clickbench_all                  1.00    225.4±1.82ms        ? ?/sec    1.01    227.7±1.99ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      3.3±0.04ms        ? ?/sec    1.01      3.3±0.05ms        ? ?/sec
physical_plan_clickbench_q10                  1.00      4.3±0.05ms        ? ?/sec    1.01      4.4±0.06ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      4.5±0.06ms        ? ?/sec    1.01      4.5±0.05ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.06ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      4.2±0.05ms        ? ?/sec    1.01      4.2±0.06ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      4.4±0.07ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      4.3±0.06ms        ? ?/sec    1.03      4.4±0.17ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.7±0.04ms        ? ?/sec    1.01      3.8±0.04ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      3.8±0.06ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q18                  1.00      3.6±0.04ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      4.5±0.05ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q2                   1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      3.3±0.05ms        ? ?/sec    1.01      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.07ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      5.0±0.06ms        ? ?/sec    1.01      5.0±0.06ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.8±0.09ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      3.9±0.05ms        ? ?/sec    1.01      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q26                  1.00      3.6±0.06ms        ? ?/sec    1.02      3.7±0.05ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      4.0±0.06ms        ? ?/sec    1.01      4.0±0.05ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      4.7±0.07ms        ? ?/sec    1.02      4.8±0.11ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.9±0.10ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      3.5±0.05ms        ? ?/sec    1.02      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q30                  1.00     16.6±0.21ms        ? ?/sec    1.00     16.6±0.18ms        ? ?/sec
...

track #13638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants