-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouping by frequency and resampling #9178
Conversation
rerun tests |
Currently when I run it with any operation, because In [10]: gdf.groupby(cudf.Grouper(key="Publish date", freq="1h", label="left", closed="left")).sum()
/raid/wangm/dev/rapids/cudf/python/cudf/cudf/core/frame.py:3076: FutureWarning: keep_index is deprecated and will be removed in the future.
FutureWarning,
Out[10]:
ID Price
Publish date
2000-01-01 12:00:00 1 30
2000-01-01 13:00:00 <NA> <NA>
2000-01-01 14:00:00 <NA> <NA>
... This issue is not specific to this PR - raising an issue to track it instead. #9667 |
if ordered and labels is not None: | ||
if len(set(labels)) != len(labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could all be a single conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted a few comments of the comments I made here. I already planned to do some refactoring of cut
, and reading this I see quite a bit more that should be done, but it's out of scope for this PR. No need to change anything, we can revisit it at a later point.
@gpucibot merge |
@gpucibot merge |
Closes #6255, #8416
This PR implements two related features:
freq=
argument tocudf.Grouper
.resample()
APIEither operation results in
_Resampler
object that represents the data resampled into "bins" of a particular frequency. The following operations are supported on resampled data:min()
andmax()
, performed bin-wiseffill()
andbfill()
methods: forward and backward filling in the case of upsampling dataasfreq()
: returns the resampled data as a Series or DataFrame()These are all best understood by example:
First, we create a time series with 1 minute intervals:
Downsampling to 3 minute intervals, followed by a "sum" aggregation:
Upsampling to 30 second intervals:
Upsampling to 30 second intervals, followed by a forward fill: