-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Unify distinct_count column/table APIs. #10183
Comments
This issue has been labeled |
This issue has been labeled |
Diving into this issue with @PointKernel led us to figure out this is more complex than it appears. In general,
These 4 knobs all interplay with each other, and the right way to do this would be:
|
@divyegala Yes, exactly! I agree this is a good plan for implementation. |
Is your feature request related to a problem? Please describe.
While reviewing #10030, I found that the column and table algorithms for
distinct_count
have completely different flags for null and NaN handling. The column API hasnull_policy
(include/exclude) andnan_policy
(NaN is/isn't null), while the table API hasnull_equality
(nulls are equal/unequal).This also applies to
unordered_distinct_count
, introduced in #10030.Describe the solution you'd like
The distinct count APIs for column/table should use the same flags (meaning that all three flags should probably be available to both APIs). This would also allow the column API to be a pass-through implementation of the table API, with a table composed of only that column, rather than having two implementations (table, column).
The text was updated successfully, but these errors were encountered: