Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor EliminateDuplicatedExpr optimizer pass to avoid clone #10218

Merged
merged 5 commits into from
Apr 26, 2024

Conversation

Lordworms
Copy link
Contributor

Which issue does this PR close?

part of #9637

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Apr 24, 2024

// use this structure to avoid initial clone
#[derive(Eq, Clone, Debug)]
struct SortExprWrapper {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the Expr in a Wrapper to support specialized comparison

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very clever 👏

.iter()
.map(|e| match e {
Expr::Sort(ExprSort { expr, .. }) => {
Expr::Sort(ExprSort::new(expr.clone(), true, false))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid the normalized clone here

Ok(None)
} else {
Ok(Some(LogicalPlan::Sort(Sort {
expr: dedup_expr.into_iter().cloned().collect::<Vec<_>>(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid another clone here

input: sort.input.clone(),
fetch: sort.fetch,
})))
let mut index_set = IndexSet::new(); // use index_set instead of Hashset to preserve order
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use index_set to preserve the original order of sort

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also quite clever

I think you can avoid a Vec here if you skip normalized_sort_keys and create unique_exprs directly, as you did below.

Something like

let unique_exprs: Vec<Expr> = sort
                    .expr
                    .into_iter()
                    // use SortExpr wrapper to ignore sort options
                    .map(|e| SortExprWrapper { expr: e })
                    .collect::<IndexSet<_>>()
                    .into_iter()
                    .map(|wrapper| wrapper.expr)
                    .collect();

@Lordworms Lordworms marked this pull request as ready for review April 24, 2024 20:35
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Lordworms -- this is really nice and quite clever

I have a few suggestions on how to make this PR better but I also think we could do it as a follow on too.

@@ -35,78 +35,107 @@ impl EliminateDuplicatedExpr {
Self {}
}
}

// use this structure to avoid initial clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// use this structure to avoid initial clone
/// Wrap the Expr in a Wrapper to support specialized comparison.
///
/// Ignores the sort options for `SortExpr` because if the expression is the same
/// the subsequent exprs are never matched
///
/// For example, `ORDER BY a ASC a DESC` is the same
// as `ORDER BY a ASC` (the second `a DESC` is never compared)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!


// use this structure to avoid initial clone
#[derive(Eq, Clone, Debug)]
struct SortExprWrapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very clever 👏

input: sort.input.clone(),
fetch: sort.fetch,
})))
let mut index_set = IndexSet::new(); // use index_set instead of Hashset to preserve order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also quite clever

I think you can avoid a Vec here if you skip normalized_sort_keys and create unique_exprs directly, as you did below.

Something like

let unique_exprs: Vec<Expr> = sort
                    .expr
                    .into_iter()
                    // use SortExpr wrapper to ignore sort options
                    .map(|e| SortExprWrapper { expr: e })
                    .collect::<IndexSet<_>>()
                    .into_iter()
                    .map(|wrapper| wrapper.expr)
                    .collect();

@alamb alamb changed the title refactor eliminate duplicated expr to avoid clone refactor EliminateDuplicatedExpr optimizer pass to avoid clone Apr 25, 2024
@alamb
Copy link
Contributor

alamb commented Apr 26, 2024

Thanks again @Lordworms -- this looks great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants