Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push join build table values as filter incase of duplicates #12225

Merged
merged 6 commits into from
Jun 14, 2022

Conversation

rohangarg
Copy link
Member

@rohangarg rohangarg commented Feb 2, 2022

Currently, the feature flag 'enableRewriteJoinToFilter' tries to rewrite a join as a filter when possible in inner joins. A correctness condition to do that rewrite requires all the keys on the build/right side of the join to be unique (since join would add multiplicity for duplicate keys).
If the values contain duplicates, we can still push them as filters but along with retaining the original join so that the correctness of results is maintained. This will help in reducing the data read by the system (when bitmap filtering is used) and also in less computation (the filtering is pushed below the join and even further whenever possible).

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@rohangarg
Copy link
Member Author

not stale

@stale
Copy link

stale bot commented May 13, 2022

This issue is no longer marked as stale.

@stale stale bot removed the stale label May 13, 2022
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept looks good to me, but I thought the implementation was tough to follow and would be improved by introducing a holder class for the return value of convertJoinToFilter.

}
}

return Optional.of(Filters.and(filters));
return new NonnullPair<>(filters.isEmpty() ? Optional.empty() : Optional.of(Filters.and(filters)), dropClause);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we ever get filters = Optional.empty() and dropClause = true? If so what is the expectation in that case?

IMO, the NonnullPair is tough to think about here, so a special class would be better. That way it could have some javadocs about the expectations. Its constructor should also do any relevant invariant checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we ever get filters = Optional.empty() and dropClause = true? If so what is the expectation in that case?

Yes, that can happen if the joinableClause has conditions on columns which only have nulls in it. Currently, in such a case I add a FalseFilter in place of the joinable clause - does that seem ok?

IMO, the NonnullPair is tough to think about here, so a special class would be better. That way it could have some javadocs about the expectations. Its constructor should also do any relevant invariant checks.

Yes, completely agree. 👍 Thanks for the suggestion - I've made a holder instead of a pair and added documentation regarding its semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can happen if the joinableClause has conditions on columns which only have nulls in it. Currently, in such a case I add a FalseFilter in place of the joinable clause - does that seem ok?

It makes sense to me. If the behavior is described in javadocs for the new holder class, that's even better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for suggestion - added to the javadocs for holder class

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants