-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Month addition or subtraction is inaccurate #6754
Comments
fyi, we have this functionality in the C++ layer: https://docs.rapids.ai/api/libcudf/nightly/group__datetime__compute.html#gac1481e3f5e0f1cb431cb12aa75ec8ef5 |
I believe pandas exposes this functionality as a module level function with |
I can look into plumbing the libcudf function through here. |
Thanks Brandon! So just to be clear, this C++ functionality is not available in Python, right? Our team is a quant team and does not have skillset to look into C++ so it'd be awesome if Python can have the same functionality! |
Right - we'd write a cuDF python API that'd be close (if not identical) to pandas and produce cython bindings that call the c++ under the hood. That said this is at the "seems like will theoretically work" phase of development and I have not at all scoped out what caveats there might be to this. |
Implements `cudf.DateOffset` - an object used for calendrical arithmetic, similar to pandas.DateOffset - for month units only. Closes #6754 Authors: - brandon-b-miller <[email protected]> - brandon-b-miller <[email protected]> - Keith Kraus <[email protected]> Approvers: - GALI PREM SAGAR - Keith Kraus - Keith Kraus URL: #6775
hi @roe246 , this should be available in the coming nightlies as |
I wish I could use cuDF to do month addition or subtraction accurately, because there could be 30, 31, 28 and 29 days in a month.
The perfect feature would take a column of datetime variable to add or substract any unit of months to be a new column, in the most clean and simple way to code and run this manipulation.
For example,
DF = {'id': ['a','b','c'], 'old_date': ['2019-11-01', '2019-12-01', '2020-01-01']}
month_add = 1
I need DF['new_date'] = DF['old_date'] + month_add
so
DF = {'id': ['a','b','c'], 'old_date': ['2019-11-01', '2019-12-01', '2020-01-01'], 'new_date': ['2019-12-01', '2020-01-01', '2020-02-01']}
In order to work around, I have to convert datetime to string and work on year and month separately and do the manipulation. A lot of extra time to breakdown single digit vs double digit month dataframes independently to process the correct datetime format and append dfs back together. ALso, single vs double digit month cannot be uniformly calculated and concatenated to YYYY-MM-DD format, like ‘2020-1-01’ and not ‘2020-01-01’ correctly
Pain points -
np.timedelta(month=n) does not consider the occurrence of 28,29,30,31 days in any month, but adds a month in terms of average number of days per month, a problem in numpy datetime calculation
dateutil.relativedelta(months=+n) does not work with RAPIDS due to issue broadcasting this specific package/function
Calculating ‘YYYY’ & ‘MM’ separately and concatenating strings back to ‘YYYY-MM-01’ would cause ‘MM’ as ‘M’ when MM<10, so we had to distinguish single M vs double MM dfs and process ad-hoc to add the ‘0’ back to single ‘M’
This approach is extremely slow bc of breaking down df and appending df back together, especially when scaled up or expanding the cudf based on other columns
The text was updated successfully, but these errors were encountered: