Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write_csv to DataFrame #1922

Merged
merged 3 commits into from
Mar 6, 2022

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #1777 task 7

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Mar 4, 2022
@matthewmturner
Copy link
Contributor Author

@alamb would you mind checking this out to see if going in right direction? I took the logic for writing csvs and put it in new function plan_to_csv within src/physical_plan/file_format/csv and moved testing there. Then just used that function in both ExecutionContext and DataFrame

currently failing because of the signature of write_csv having impl AsRef<str> which i took from the existing implementation on ExecutionContext but doesnt work because trait object cant have method with generic type parameter. Is there a reason we cant use &str? More details here https://doc.rust-lang.org/error-index.html#E0038 and i dont think any of the proposed workarounds are good for our use case.

@alamb
Copy link
Contributor

alamb commented Mar 4, 2022

I took the logic for writing csvs and put it in new function plan_to_csv within src/physical_plan/file_format/csv and moved testing there. Then just used that function in both ExecutionContext and DataFrame

Looks good to me 👍

currently failing because of the signature of write_csv having impl AsRef which i took from the existing implementation on ExecutionContext but doesnt work because trait object cant have method with generic type parameter. Is there a reason we cant use &str?

I think taking &str is totally fine in the DataFrame trait

@matthewmturner
Copy link
Contributor Author

@alamb thanks!

@jimexist
Copy link
Member

jimexist commented Mar 5, 2022

Hi thanks for the contribution

I wonder if we can use an extension trait for Csv writing method so that users can choose to use the method if they import the trait - same goes to other potential methods like write json files, etc.

This way we can reduce the minimal dataframe struct size

@matthewmturner
Copy link
Contributor Author

@jimexist im not opposed, but would the idea be to do this only for methods that are writing? or should this be generalized to all IO methods? Im just wondering where we draw the line between core dataframe methods and extensions.

@matthewmturner
Copy link
Contributor Author

@jimexist what would you think about merging this as is and then tackling that point as part of convos on #1712 where there is interest in moving DataFrame from trait to struct.

@alamb alamb merged commit 0b9b30a into apache:master Mar 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EPIC] Improve DataFusions ability to work with files
3 participants