Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Implement cudf.DateOffset for months #6775

Merged

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented Nov 16, 2020

Implements cudf.DateOffset - an object used for calendrical arithmetic, similar to pandas.DateOffset - for month units only.

Closes #6754

@brandon-b-miller brandon-b-miller added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. Cython labels Nov 16, 2020
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner November 16, 2020 16:25
@brandon-b-miller brandon-b-miller self-assigned this Nov 16, 2020
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Nov 16, 2020

Codecov Report

Merging #6775 (3dfbd68) into branch-0.18 (515a173) will increase coverage by 0.02%.
The diff coverage is 91.11%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #6775      +/-   ##
===============================================
+ Coverage        82.01%   82.04%   +0.02%     
===============================================
  Files               96       96              
  Lines            16340    16384      +44     
===============================================
+ Hits             13402    13443      +41     
- Misses            2938     2941       +3     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/column.py 88.03% <ø> (+0.10%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 83.01% <88.23%> (+1.41%) ⬆️
python/cudf/cudf/__init__.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/column/datetime.py 89.42% <100.00%> (+0.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 515a173...3dfbd68. Read the comment docs.

@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 1, 2020
@brandon-b-miller brandon-b-miller changed the title [WIP] Implement cudf.DateOffset for months [REVIEW] Implement cudf.DateOffset for months Dec 1, 2020
@brandon-b-miller
Copy link
Contributor Author

This is ready for feedback

Comment on lines 343 to 346
if op == 'sub':
months = -self._months
else:
months = self._months
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if op == 'sub':
months = -self._months
else:
months = self._months
months = -self._months if op == "sub" else self._months

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. This is a good start, We can have support for the rest of the parameters too like minutes, seconds, microseconds, and nanoseconds in cudf.DateOffset by wrapping around a TimeDelta Scalar object internally. This can be tackled in a follow-up PR.

@brandon-b-miller
Copy link
Contributor Author

Looks good to me. This is a good start, We can have support for the rest of the parameters too like minutes, seconds, microseconds, and nanoseconds in cudf.DateOffset by wrapping around a TimeDelta Scalar object internally. This can be tackled in a follow-up PR.

I had a similar thought. I am thinking this is a little more complicated than it might seem however. In Pandas, DateOffset respects calendrical arithmetic rather than absolute time arithmetic. Adding DateOffset(days=1) to a date might not be the same as adding 24 hours in some cases, for instance if we're crossing a daylight savings time boundary the result will be an hour off. So we'd need to create some logic in the python layer to adjust for this and create the right timedelta. From that perspective it might be better suited to put this logic in libcudf.

@brandon-b-miller brandon-b-miller added feature request New feature or request non-breaking Non-breaking change labels Dec 2, 2020
@kkraus14 kkraus14 changed the base branch from branch-0.17 to branch-0.18 December 4, 2020 18:21
Comment on lines 337 to 338
def __init__(self, months):
self._months = months
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow the signature of Pandas DateOffset here: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.DateOffset.html

We should also validate that we support the parameters that are used.

python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
Comment on lines 391 to 396
def _generate_column(self, size, op):
months = -self.months if op == "sub" else self.months
col = cudf.core.column.as_column(
months, dtype=np.dtype("int16"), length=size
)
return col
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does libcudf have an overload for the API that takes in a device scalar instead of a column? If not could you raise an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raised #6990

python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
@brandon-b-miller
Copy link
Contributor Author

@kkraus14 I think this is ready for another look.

python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/tools/datetimes.py Outdated Show resolved Hide resolved
if k in all_possible_kwargs:
# Months must be int16
dtype = "int16" if k == "months" else None
kwds[k] = cudf.Scalar(v, dtype=dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does using cudf.Scalar objects do to the __repr__?

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit:
looking into this again, this approach actually won't work for a few reasons. This trips the validation inside pd.DateOffset for non-integer years but not months for example. I suppose we will really need to override __setattr__ which I was hoping to avoid :(

@brandon-b-miller
Copy link
Contributor Author

Looking into these CI failures, but can't currently build the code. Stay tuned.

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 3 - Ready for Review Ready for review by team labels Dec 15, 2020
@rapids-bot rapids-bot bot merged commit 1963111 into rapidsai:branch-0.18 Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Month addition or subtraction is inaccurate
5 participants