-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Implement cudf.DateOffset for months #6775
[REVIEW] Implement cudf.DateOffset for months #6775
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #6775 +/- ##
===============================================
+ Coverage 82.01% 82.04% +0.02%
===============================================
Files 96 96
Lines 16340 16384 +44
===============================================
+ Hits 13402 13443 +41
- Misses 2938 2941 +3
Continue to review full report at Codecov.
|
This is ready for feedback |
if op == 'sub': | ||
months = -self._months | ||
else: | ||
months = self._months |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if op == 'sub': | |
months = -self._months | |
else: | |
months = self._months | |
months = -self._months if op == "sub" else self._months |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. This is a good start, We can have support for the rest of the parameters too like minutes
, seconds
, microseconds
, and nanoseconds
in cudf.DateOffset
by wrapping around a TimeDelta Scalar object internally. This can be tackled in a follow-up PR.
I had a similar thought. I am thinking this is a little more complicated than it might seem however. In Pandas, |
def __init__(self, months): | ||
self._months = months |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow the signature of Pandas DateOffset
here: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.DateOffset.html
We should also validate that we support the parameters that are used.
def _generate_column(self, size, op): | ||
months = -self.months if op == "sub" else self.months | ||
col = cudf.core.column.as_column( | ||
months, dtype=np.dtype("int16"), length=size | ||
) | ||
return col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does libcudf have an overload for the API that takes in a device scalar instead of a column? If not could you raise an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raised #6990
Co-authored-by: Keith Kraus <[email protected]>
Co-authored-by: Keith Kraus <[email protected]>
@kkraus14 I think this is ready for another look. |
if k in all_possible_kwargs: | ||
# Months must be int16 | ||
dtype = "int16" if k == "months" else None | ||
kwds[k] = cudf.Scalar(v, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does using cudf.Scalar
objects do to the __repr__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit:
looking into this again, this approach actually won't work for a few reasons. This trips the validation inside pd.DateOffset
for non-integer years but not months for example. I suppose we will really need to override __setattr__
which I was hoping to avoid :(
Looking into these CI failures, but can't currently build the code. Stay tuned. |
Implements
cudf.DateOffset
- an object used for calendrical arithmetic, similar to pandas.DateOffset - for month units only.Closes #6754