-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] collect_list (rolling_window, groupby) fails for empty input #7611
Comments
What does the failure manifest as? |
For the first example above, here is the error message:
|
Hmm. Looks like |
Ok, here are my findings regarding
The results are as expected. The assertion that this produces the wrong result is patently false.
In summary: both assertions regarding window functions are incorrect. I will raise a PR for fixing the result column type, so that this wasn't a complete waste of time. |
How is this apply to |
Updated the issue, only empty input actually causes an issue. |
The description is still not correct:
This implies that For completeness: The issue regarding non-repeated keys was pilot error. The expected result in your test case was constructed incorrectly. Both I will raise a PR for the rolling-window's result type after higher priority tasks are addressed. |
This issue has been labeled |
Fixes the group-by portion of #7611. When `COLLECT_LIST()` or `COLLECT_SET()` aggregations are called on a grouped input, if the input column is empty, then one sees the following failure: ``` C++ exception with description "cuDF failure at: .../cpp/src/column/column_factories.cpp:67: make_empty_column is invalid to call on nested types" thrown in the test body. ``` The operation should have resulted in an empty `LIST` column. `make_empty_column()` does not support `LIST` types (in part because the `data_type` parameter does not capture the types of the child columns). This commit fixes this by constructing the output column from the specified `values` input, but only for `COLLECT_LIST()` and `COLLECT_SET()`; other aggregation types are unchanged. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Conor Hoekstra (https://github.com/codereport) - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec URL: #8279
Fixes the rolling-window part of #7611. All the rolling window functions return empty results when the input aggregation column is empty, just as they should. But the column types are incorrectly set to match the input type. While this is alright for `[MIN(), MAX(), LEAD(), LAG()]`, it is incorrect for some aggregations: Aggregation | Input Types | Output Type | --------------|----------------------|-----------------------------------| COUNT_VALID | All types | INT32 | COUNT_ALL | All types | INT32 | ROW_NUMBER | All types | INT32 | SUM | Numerics (e.g. INT8) | 64-bit promoted type (e.g. INT64) | SUM | Chrono | Same as input type | SUM | All else | Unsupported | MEAN | Numerics | FLOAT64 | MEAN | Chrono | FLOAT64 | MEAN | All else | Unsupported | COLLECT_LIST | All types T | LIST with child of type T | This mapping is congruent with `cudf::target_type_t` from `<cudf/detail/aggregation/aggregation.hpp>`. This commit corrects the type of the output column that results from an empty input. It adds test for all the combinations listed above. Note: This is dependent on #8158, and should be merged after that is committed. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec - Vyas Ramasubramani (https://github.com/vyasr) URL: #8274
Currently,
collect_list
is implemented for rolling_window and groupby. However, those implementations do not have tests for the cases when the input column(s) are empty.For instance, below are some tests in which exception is thrown:
The text was updated successfully, but these errors were encountered: