Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to csv and json #42

Merged
merged 5 commits into from
Jan 20, 2021
Merged

Export to csv and json #42

merged 5 commits into from
Jan 20, 2021

Conversation

MrAlecJohnson
Copy link
Contributor

@MrAlecJohnson MrAlecJohnson commented Jan 8, 2021

Adds export functions and tests to meet issue #5. The options and formatting are set up so if you read a table using a schema, export it to csv or json, then read the exported file with the same schema, the tables will be identical.

This doesn't implement the option to conform a table to a schema as part of exporting. Currently the export functions don't give much control over the format of the exported data - particularly for dates and times. But I think this is okay, as the way the data is written to file is less important than how the package reads those files.

These export functions mostly use default pandas export options. However, period types have to be reformatted as date strings so Arrow can read them. Exporting dates and times to json also requires explicitly converting them to strings to avoid epoch values (which Arrow reads differently from Pandas) and problems with missing values.

@MrAlecJohnson MrAlecJohnson requested a review from a team January 8, 2021 12:18
Copy link
Contributor

@isichei isichei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Minor comments

tests/test_export.py Outdated Show resolved Hide resolved
tests/test_export.py Outdated Show resolved Hide resolved
arrow_pd_parser/export.py Outdated Show resolved Hide resolved
pd_date_type=date_args,
pd_timestamp_type=date_args,
)
# Write to StringIO then convert to BytesIO so Arrow can read it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this comment. Succinctly points users to a gotcha.

@MrAlecJohnson MrAlecJohnson merged commit cd9bd96 into main Jan 20, 2021
@MrAlecJohnson MrAlecJohnson deleted the export-csv-json branch January 20, 2021 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants