Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning serialization #1941

Merged
merged 3 commits into from
Mar 8, 2022
Merged

Pruning serialization #1941

merged 3 commits into from
Mar 8, 2022

Conversation

thinkharderdev
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently, we don't preserve pruning predicates in ParquetExec when serializing for Ballista. This was probably the case because the PruningPredicate requires the original logical Expr to reconstruct. This PR preserves the original Expr in PruningPredicate so we can serialize it.

What changes are included in this PR?

  1. Preserve the original logical Expr in the PruningPredicate struct.2.
  2. Serialize the pruning Expr in the protobuf message which encodes the ParquetExec
  3. Reconstruct the PruninPredicate when deserializing the ParquetExec.

Are there any user-facing changes?

No

No

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Mar 7, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @thinkharderdev

@@ -143,6 +145,7 @@ impl PruningPredicate {
schema,
predicate_expr,
required_columns,
logical_expr: expr.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth considering changing the signature of this function from `

pub fn try_new(expr: &Expr, schema: SchemaRef) -> Result<Self> {

to take the Expr directly rather than clone within

pub fn try_new(expr: Expr, schema: SchemaRef) -> Result<Self> {

The rationale would be if the caller already has a Expr they could pass it in rather than forcing a copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could always do that as a follow on PR as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I should have some time tomorrow to see if I can do this relatively easily. If it turns out to be more involved then we can create a follow up task for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thinkharderdev -- to keep the code flowing I am going to merge this PR as is and we can do the cleanup in a follow on PR. Let me know if you don't get a chance and I can file a follow on ticket to do so

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great. Thanks!

@alamb alamb merged commit e714a4a into apache:master Mar 8, 2022
@thinkharderdev thinkharderdev deleted the pruning-serialization branch March 8, 2022 16:35
@alamb
Copy link
Contributor

alamb commented Mar 8, 2022

This PR seems to have a logical conflict with #1887 -- I am working to fix that

@alamb
Copy link
Contributor

alamb commented Mar 8, 2022

Fix in #1958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants