Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix csv writing of timestamps to show timezone. #849

Merged
merged 5 commits into from
Nov 16, 2021

Conversation

novemberkilo
Copy link
Contributor

Which issue does this PR close?

Closes #778

What changes are included in this PR?

  • Round-trip test of csv writing and reading of timestamps
  • Fix to csv writer so that timestamps are written out in UTC.

Are there any user-facing changes?

Yes. This changes the behaviour of the csv writer.

// @alamb @bjchambers

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 23, 2021
@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2021

Codecov Report

Attention: Patch coverage is 23.25581% with 33 lines in your changes missing coverage. Please review.

Project coverage is 82.28%. Comparing base (898924f) to head (e83b111).
Report is 2721 commits behind head on master.

Files Patch % Lines
arrow/src/csv/writer.rs 24.39% 31 Missing ⚠️
arrow/src/compute/kernels/temporal.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #849      +/-   ##
==========================================
- Coverage   82.45%   82.28%   -0.17%     
==========================================
  Files         168      168              
  Lines       48231    48575     +344     
==========================================
+ Hits        39767    39971     +204     
- Misses       8464     8604     +140     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @novemberkilo !

arrow/src/csv/writer.rs Outdated Show resolved Hide resolved
@novemberkilo novemberkilo changed the title Fix csv writing of timestamps to be in UTC. Fix csv writing of timestamps to show timezone. Oct 31, 2021
@novemberkilo
Copy link
Contributor Author

@alamb @bjchambers I have made changes to the csv writer according to the proposal.

While doing so, I have noticed what I think is a bug in the temporal utilities in the kernel. Essentially, variations relating to daylight savings are not recognised at the moment. I have a failing test on a branch here. I came up across this issue during this work as and had to implement https://github.com/novemberkilo/arrow-rs/blob/master/arrow/src/compute/kernels/temporal.rs#L98-L110 to correctly derive the offset for a timezone.

I figure I should fix this bug separately -- should I file an issue to describe it and then put up a fix against that? Thanks.

@alamb
Copy link
Contributor

alamb commented Nov 2, 2021

I figure I should fix this bug separately -- should I file an issue to describe it and then put up a fix against that? Thanks.

I think that would make the most sense to me @novemberkilo -- thanks

@alamb
Copy link
Contributor

alamb commented Nov 9, 2021

Thanks @novemberkilo -- I plan to review this more carefully hopefully today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @novemberkilo this is looking quite nice.

arrow/src/compute/kernels/temporal.rs Show resolved Hide resolved
arrow/src/csv/writer.rs Outdated Show resolved Hide resolved
arrow/src/csv/writer.rs Outdated Show resolved Hide resolved
arrow/src/csv/writer.rs Show resolved Hide resolved
arrow/src/csv/writer.rs Outdated Show resolved Hide resolved
arrow/src/csv/writer.rs Show resolved Hide resolved
arrow/src/csv/writer.rs Outdated Show resolved Hide resolved
@novemberkilo
Copy link
Contributor Author

Thanks much for the feedback @alamb -- I have reviewed your comments and made changes accordingly.

Something to note, instead of adding Z to indicate UTC, we now consistently use the same format string and show UTC as +00:00

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @novemberkilo

let expected = vec![Some(3), Some(2), Some(1)];
assert_eq!(actual, expected);
let actual = c3.into_iter().collect::<Vec<_>>();
let expected = nanoseconds.into_iter().map(|x| Some(x)).collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@novemberkilo
Copy link
Contributor Author

novemberkilo commented Nov 15, 2021

Thanks @alamb - once this is merged I will put up a PR for #901 that builds from this

@alamb
Copy link
Contributor

alamb commented Nov 15, 2021

I'll plan to merge this tomorrow morning unless anyone else has any comments

@alamb alamb merged commit bfc95be into apache:master Nov 16, 2021
@alamb
Copy link
Contributor

alamb commented Nov 16, 2021

Thanks again @novemberkilo

alamb pushed a commit that referenced this pull request Nov 20, 2021
* Write timestamps (in csvs) with timezone.

* More tests and more verbose naming.

* Please linter.

* Please clippy.

* Cleanup based on review feedback.
alamb added a commit that referenced this pull request Nov 22, 2021
* Write timestamps (in csvs) with timezone.

* More tests and more verbose naming.

* Please linter.

* Please clippy.

* Cleanup based on review feedback.

Co-authored-by: Navin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamps don't round-trip through CSV writer/reader
4 participants