Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ParquetExec::builder(), deprecate ParquetExec::new #10636

Merged
merged 3 commits into from
May 29, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 23, 2024

Which issue does this PR close?

Part of #10546

Rationale for this change

While working on #10549 it cumbersome to create a ParquetExec -- there are a few fields on ParquetExec::new that need default values that make it slightly confusing to use.

What changes are included in this PR?

  1. Add ParquetExecBuilder and ParquetExec::builder() that handle defaults
  2. Add an example and a bunch of docs
  3. Deprecate ParquetExec::new
  4. Update the code to use the new API

You can see this in action here #10618

Are these changes tested?

Yes, covered by existing tests and new doc example tests

Are there any user-facing changes?

Better docs / examples / builder.

ParquetExec::new() is now deprecated

@github-actions github-actions bot added the core Core DataFusion crate label May 23, 2024
@alamb alamb closed this May 23, 2024
@alamb alamb reopened this May 23, 2024
@alamb alamb force-pushed the alamb/parquet_builder branch 4 times, most recently from 6cefd61 to 313252e Compare May 24, 2024 09:38
@alamb alamb mentioned this pull request May 24, 2024
2 tasks
@alamb alamb changed the title Add ParquetExec::builder API Add ParquetExec::builder(), deprecate ParquetExec::new May 28, 2024
@alamb alamb marked this pull request as ready for review May 28, 2024 13:29
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor 👍 Those docs are also brilliant!

A tiny suggestion, we can add #[must_use] to those methods on ParquetExecBuilder to prevent a builder is not .build() at the end.

@alamb
Copy link
Contributor Author

alamb commented May 29, 2024

Thanks @waynexia !

A tiny suggestion, we can add #[must_use] to those methods on ParquetExecBuilder to prevent a builder is not .build() at the end.

I added what I think you are referring to in 70a14dc (though I am not quite sure). Let me know if you had a different idea and I will make that change.

@alamb alamb merged commit 77352b2 into apache:main May 29, 2024
23 checks passed
@alamb alamb deleted the alamb/parquet_builder branch May 29, 2024 20:46
@waynexia
Copy link
Member

Something like https://github.com/apache/arrow-rs/blob/12c0d007947a2acb01738a737c3be24f779749f8/arrow-flight/src/arrow.flight.protocol.rs#L501-L523

But after some search, I realized that #[must_use] and the builder pattern don't have a tight association (don't remember why I have such an illusion... 🤣

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
)

* Add `ParquetExec::builder()`, deprecate `ParquetExec::new`

* Add a #[must_use]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants