-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create simple storage writer #826
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrchtr!
Could you add it to the integration test as well?
The component is missing a README which makes the docs build fail. Not sure why this is not caught by the pre-commit hook? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matthias! Left a few comments
The readme was generated, but I have forgotten to push it. |
Co-authored-by: Philippe Moussalli <[email protected]>
Co-authored-by: Philippe Moussalli <[email protected]>
Co-authored-by: Philippe Moussalli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
components/write_to_file/Dockerfile
Outdated
@@ -0,0 +1,28 @@ | |||
FROM --platform=linux/amd64 python:3.8-slim as base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to start using newer python versions since 3.8 will reach end-of-life
before you know it. (https://devguide.python.org/versions/)
### Produces | ||
|
||
|
||
**This component does not produce data.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct bit it might confuse users.
Maybe word it like: This component does not produce a dataframe for downstream processing.
components/write_to_file/README.md
Outdated
| argument | type | description | default | | ||
| -------- | ---- | ----------- | ------- | | ||
| path | str | Path to store the dataset, whether it's a local path or a cloud storage bucket, must be specified. A separate filename will be generated for each partition. If you are using the local runner and export the data to a local directory, ensure that you mount the path to the directory using the `--extra-volumes` argument. | / | | ||
| format | str | Format for storing the dataframe can be either `csv` or `parquet`. | csv | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the CSV write headers ? Does it use comma
as a delimiter ? Should we allow for specifying this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV is written with headers and comma separated. I'll add to the description.
I would not make this configurable. I see this one more as an example component. For more custom export methods I would recommend implementing a python component instead, no?
Wondering: did you test it with a remote path ? |
I've tested it quickly and it is working :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Added `write_to_file` to both getting started guides. #826 already added the component to the example pipeline. fix #825 --------- Co-authored-by: Robbe Sneyders <[email protected]>
Write component that writes a dask dataframe to a file, either csv or parquet.
Fix #824