-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples of DataFrame::write* methods without S3 dependency #8606
Conversation
async fn main() -> Result<(), DataFusionError> { | ||
let ctx = SessionContext::new(); | ||
let local = Arc::new(LocalFileSystem::new_with_prefix("./").unwrap()); | ||
let local_url = Url::parse("file://local").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this by removing the LocalFileSystem
register. We can show the default behavior here, and how to extend in S3 example. Let's make the first-time user climb the stairs one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that is definitely simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @devinjdangelo -- I think this looks great and thank you for doing it.
Can you you please also add an entry to the readme here: https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples#single-process ?
.write_table("test", DataFrameWriteOptions::new()) | ||
.await?; | ||
|
||
df.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be valuable to update one of these options showing DataFrameWriteOptions
something like
// you can use DataFrameWriteOptions to control how the dataframe output is created
// for example:
....
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed up an update with a DataFrameWriteOptions example and added the new example to the README file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you everyone 🚀
df.clone() | ||
.write_csv( | ||
"./datafusion-examples/test_csv/", | ||
// DataFrameWriteOptions contains options which control how data is written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
Closes #8551
Rationale for this change
We currently do not have an example of DataFrame::write_table, nor other DataFrame::write* methods which do not depend on an external S3 bucket.
What changes are included in this PR?
Adds examples of DataFrame::write_table and other write* methods using LocalFileSystem object store.
Are these changes tested?
Via existing tests
Are there any user-facing changes?
No