-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support bigframes.pandas.to_datetime for scalars, iterables and series. #372
Conversation
x.re_search(TIMEZONE_POS_REGEX), | ||
( | ||
( | ||
x.substr(0, x.length() - 6).to_timestamp(op.format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 is a bit of a "magic" number here. Please make some constants and comments explaining the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
) | ||
.cast(ibis_dtypes.Timestamp(timezone="UTC")) | ||
.cast(ibis_dtypes.int64) | ||
- x.substr(x.length() - 5, 2).cast(ibis_dtypes.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, this is more "magic" with regards to which substrings we're look at. Perhaps some helper functions would be useful too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise for the rest of this function. Please refactor so that it's easier to validate the correctness of each smaller part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
elif x.type() == ibis_dtypes.Timestamp(timezone="UTC"): | ||
return x | ||
elif x.type() != ibis_dtypes.timestamp: | ||
unit = op.unit if op.unit is not None else "ns" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's comment why we are making "ns" the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# Note: Due to an issue where casting directly to a non-UTC | ||
# timezone does not work, we first cast to UTC. This seems | ||
# to bypass a potential bug in Ibis's cast function, allowing | ||
# for subsequent casting to a non-UTC timezone. Further |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it mean to cast to non-UTC timezone type in BigQuery? It only supports UTC at the data-type level, even though other timezones are supported for parsing and formatting.
Please raise an error if someone tries to cast to a non-UTC timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the confusion, will update the comment, this means without timezone. Basically this is for utc=True vs utc=False. Because of some unknown issue related to data type, potentially because of ibis, it's impossible to cast to the proper type when utc=False, unless cast it to utc timezone first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on tests it would appear although the result of int64 to_timestamp is in utc timezone, the cast function think the datatype is actually without timezone, and skip the cast, this is to fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated.
bigframes/core/tools/datetimes.py
Outdated
f"to datetime is not implemented. {constants.FEEDBACK_LINK}" | ||
) | ||
|
||
if ~isinstance(arg, bigframes.series.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use bitwise negation to negate a boolean.
if ~isinstance(arg, bigframes.series.Series): | |
if not isinstance(arg, bigframes.series.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's add a comment that this is intended to support pandas Series (and Index maybe?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bigframes/core/tools/datetimes.py
Outdated
# TODO: Currently, data upload is performed using pandas DataFrames | ||
# combined with the `read_pandas` method due to the BigFrames DataFrame | ||
# constructor's limitations in handling various data types. Plan to update | ||
# the upload process to utilize the BigPandas DataFrame constructor directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# the upload process to utilize the BigPandas DataFrame constructor directly | |
# the upload process to utilize the BigQuery DataFrame constructor directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bf_result = ( | ||
bpd.to_datetime(arg, utc=utc, unit=unit, format=format) | ||
.to_pandas() | ||
.astype("datetime64[ns, UTC]" if utc else "datetime64[ns]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fascinating. So utc=False will use DATETIME type in BigQuery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for utc=False, it will be later cast to DATETIME type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the example sql: SELECT
CAST(t0.0 AS DATETIME) AS Cast_0_ timestamp
FROM ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
https://screenshot.googleplex.com/34gywaqpL9yYGSm
https://screenshot.googleplex.com/9QCcdhPgiuwbRk7
https://screenshot.googleplex.com/3sjTQ3xwj3MWp9j
For string/datetime dtypes, currently are limited to utc=True as we have limited timezone support.
Fixes #<issue_number_goes_here> 🦕