-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add dictionary support to libcudf groupby functions #6585
[REVIEW] Add dictionary support to libcudf groupby functions #6585
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #6585 +/- ##
===============================================
+ Coverage 82.09% 82.11% +0.02%
===============================================
Files 97 97
Lines 16474 16477 +3
===============================================
+ Hits 13524 13530 +6
+ Misses 2950 2947 -3
Continue to review full report at Codecov.
|
I realized that this PR should wait until #6392 as it will likely have a lot of conflicts. |
I cannot get around the 10.2 ptxas compile segfault as documented here https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=3186317 The PR includes logic to throw an exception if a dictionary column is used as values with one of these aggregation types. The code is isolated around a CUDA==10.2 equivalent compile directive so technically these aggregations work if compiled with 10.1 or 11.0 but the gtests have been commented out. |
rerun tests |
1 similar comment
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn. I missed it by a minute.
|
||
// prevent divide by zero error | ||
if (group_size == 0 or group_size - ddof <= 0) return 0.0; | ||
|
||
ResultType mean = d_means.element<ResultType>(group_idx); | ||
ResultType mean = d_means[group_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of the reverse direction of what I've been doing. column_device_view
's element()
accessor is more helpful than simple subscript. e.g. in fixed point. element()
will return the value with the scale applied. And that scale is stored once in the column.
Any particular reason for this change?
Reference #5963 Add dictionary support to groupby.