Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign nullable dtypes to dataframe columns #7

Merged
merged 5 commits into from
Apr 17, 2024
Merged

Conversation

hamima-halim
Copy link
Contributor

WIP.

A pr to fix:
#4

  • pandas adventures
  • timestamp adventures
  • sneaky types adventures

Even though parquet files have explicit per-column dtype metadata, pandas will overwrite these instructions for nullable integer columns and assign them as floats. Down the line, this causes overflow errors when numpy is trying to recast the epoch timestamps into datetimes.
More info: https://pandas.pydata.org/docs/user_guide/integer_na.html#nullable-integer-data-type

Tests to come.

@hamima-halim hamima-halim requested a review from idreyn April 16, 2024 22:12
@@ -58,12 +58,23 @@ def _local_save(s3_key, stop_events):
def _process_arrival_departure_times(pq_df: pd.DataFrame) -> pd.DataFrame:
"""Process and collate arrivals and departures for a timetable of events.

Before: TODO add example
After: TODO add example
This does two things:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for documenting everything 👏

@devinmatte
Copy link
Member

devinmatte commented Apr 16, 2024

Would be cool with this being merged as is and tests come in a second PR, but also fine to re-review later if you want to do tests in this same PR

@hamima-halim hamima-halim merged commit 8c8f46d into main Apr 17, 2024
2 checks passed
@hamima-halim hamima-halim deleted the numpy_nullable branch April 17, 2024 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants