-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push join build table values as filter incase of duplicates #12225
Conversation
bfd161a
to
03d424c
Compare
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions. |
not stale |
This issue is no longer marked as stale. |
03d424c
to
439c959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept looks good to me, but I thought the implementation was tough to follow and would be improved by introducing a holder class for the return value of convertJoinToFilter
.
} | ||
} | ||
|
||
return Optional.of(Filters.and(filters)); | ||
return new NonnullPair<>(filters.isEmpty() ? Optional.empty() : Optional.of(Filters.and(filters)), dropClause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we ever get filters = Optional.empty()
and dropClause = true
? If so what is the expectation in that case?
IMO, the NonnullPair is tough to think about here, so a special class would be better. That way it could have some javadocs about the expectations. Its constructor should also do any relevant invariant checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we ever get filters = Optional.empty() and dropClause = true? If so what is the expectation in that case?
Yes, that can happen if the joinableClause has conditions on columns which only have nulls in it. Currently, in such a case I add a FalseFilter
in place of the joinable clause - does that seem ok?
IMO, the NonnullPair is tough to think about here, so a special class would be better. That way it could have some javadocs about the expectations. Its constructor should also do any relevant invariant checks.
Yes, completely agree. 👍 Thanks for the suggestion - I've made a holder instead of a pair and added documentation regarding its semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that can happen if the joinableClause has conditions on columns which only have nulls in it. Currently, in such a case I add a FalseFilter in place of the joinable clause - does that seem ok?
It makes sense to me. If the behavior is described in javadocs for the new holder class, that's even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks for suggestion - added to the javadocs for holder class
Conflicts: processing/src/main/java/org/apache/druid/segment/join/table/IndexedTableJoinable.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Currently, the feature flag 'enableRewriteJoinToFilter' tries to rewrite a join as a filter when possible in inner joins. A correctness condition to do that rewrite requires all the keys on the build/right side of the join to be unique (since join would add multiplicity for duplicate keys).
If the values contain duplicates, we can still push them as filters but along with retaining the original join so that the correctness of results is maintained. This will help in reducing the data read by the system (when bitmap filtering is used) and also in less computation (the filtering is pushed below the join and even further whenever possible).
This PR has: