-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose conditional join size calculation #8928
Expose conditional join size calculation #8928
Conversation
@jlowe it looks like the hash join APIs for the output size are all detail APIs, so I assume that the same is fine here but let me know if that's not the case and I can plumb this up higher in the hierarchy. |
The APIs are public, part of the |
5890929
to
87701fe
Compare
I forgot that the hash join class itself was public, and none of free functions in cuDF for joins had this parameter added so I was confused (such as |
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #8928 +/- ##
===============================================
Coverage ? 10.65%
===============================================
Files ? 114
Lines ? 19080
Branches ? 0
===============================================
Hits ? 2033
Misses ? 17047
Partials ? 0 Continue to review full report at Codecov.
|
…tional_join_size # Conflicts: # cpp/src/join/conditional_join.cuh # cpp/src/transform/compute_column.cu
The failing test is intended to test the cudf behavior when an invalid table reference is specified. It's asking for a RIGHT table column on an expression that only takes one table. In the past it would silently ignore table references and act like it was referencing the LEFT table, but that doesn't seem to be the case anymore. If the behavior is explicitly undefined for this case we can remove the test, but it would be interesting to know why the behavior changed. |
…apper and validate that expressions evaluated on a single table don't reference the right table.
Discussed offline, I'm changing the behavior to always throw an exception in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally +1. A couple of clarification questions, really.
This was an enjoyable, and informative read. I'd like to look more closely at the kernels themselves, later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@gpucibot merge |
This adds Java bindings to the conditional join output size APIs added in #8928. In preparation for adding Java APIs for hash-join output sizing, it also refactors the names of the conditional join APIs to avoid overloading by adding "conditional" to the method names. Authors: - Jason Lowe (https://github.com/jlowe) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #9002
Resolves #8918 by providing a new API for getting the output size for conditional joins (except full joins). This PR removes the unnecessary
conditional_join.cuh
header and inlines the logic into theconditional_join.cu
file where it is used, and adds the new logic into that file as well. The public APIs are now exposed inconditional_join.hpp
.Adding the join size calculation also revealed a couple of bugs in the conditional join tests that were hiding a real bug in the conditional join implementation. The main test bug was the use of
std::equal
with the actual result as the first iterator, so if the actual result was empty it was never compared against the expected result (even if it was nonempty). This bug masked a couple of minor errors in the expected outputs encoded in the test. These are now fixed. This bug was also hiding a deeper issue where the AST device code was always using the left row index to pull data for the left hand operand to binary operations, even when the lhs was actually a column from the right table. That bug is now fixed as well.Contributes to #8145.