-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: tuple IN (subquery) cannot be converted to lookup semi-join #43198
Comments
Hi @RaduBerinde , I am new to the organization. Can I take this issue? |
Sure. I would start by understanding some existing normalization rules, e.g. in |
@RaduBerinde I'm a little confused about what the ask is here - I thought it was to add an optimisation rule that rewrites IN queries using tuples to pull the tuples up into equalities, so
it seems that the rewritten query would still be using a hash join - have I misunderstood what the rewritten query should look like? |
The query plan depends on the table statistics. If both tables are about the same size, there is no reason to choose a lookup join. You'd have to make the |
…roachdb#43198) Initial WIP progress commit for sanity check of proposed changes Release note (<category, see below>): <what> <show> <why>
@RaduBerinde Ok, I think I've worked out what's going on - the IN statement is converted to an equality on a projected variable, which we can't currently detect and simply, resulting in the extra projections. I've written a new rule to detect such projections and try to convert the tuple equalities into multiple single-valued equalities, and hoist the projection above the join. I'm not sure whether it's the correct approach, though, as it produces some quite specific (and currently quite messy) code, so I'd appreciate some guidance that I'm on the right track before I go any further - would you be able to look at the above pull request and give me some feedback? |
Initial pull request was to the main repo, which it's definitely not ready for, so I've closed it - forked pull request is here RoryBlevins#1 |
…roachdb#43198) Initial WIP progress commit for sanity check of proposed changes Release note (<category, see below>): <what> <show> <why>
…roachdb#43198) Changes in response to code review
@RaduBerinde new updated pull request at RoryBlevins#2 , further feedback would be useful, thanks |
This looks great overall, thanks! I think you can post up a PR against the main repo for a full review. |
Great, thanks for your help |
@RaduBerinde Before I submit a proper PR, it looks like I have one failing data test - to me that looks like an improvement (at any rate, it seems considerably simpler as it's eliminated two projections), but this is my first time in the world of SQL optimisation, so I'd like some reassurance that it is actually an improvement before I rewrite someone else's test:
|
The plan is definitely an improvement. One other thing to consider is if the test is still testing what it was meant to test. In this case it looks like it's for the |
This commit adds a new rule to join normalisation that detects joins on tuple equalities between a tuple and a variable that is a projected tuple and splits them out into multiple equalities on the underlying columns, and hoists the project above the join. This allows us to simplify a number of queries where previously the projected tuple was preventing us merging projections. Resolves: cockroachdb#43198 Release justification: low-risk performance improvement
This commit adds a new rule to join normalisation that detects joins on tuple equalities between a tuple and a variable that is a projected tuple and splits them out into multiple equalities on the underlying columns, and hoists the project above the join. This allows us to simplify a number of queries where previously the projected tuple was preventing us merging projections. Resolves: cockroachdb#43198 Release justification: low-risk performance improvement
Hi @RaduBerinde @RoryBlevins , it seems like the PR for this was abandoned. Can I take over this issue? |
Please do - unfortunately I accepted a new job which prevents me from continuing to work on this - I would be happy to see someone finish this. |
@longnguyennn Sorry, but we ran into a case where this transformation would have helped a lot so I am starting to work on it ASAP. I will first try out @andy-kimball's suggestion from the old PR to try removing the tuples earlier, at the Select level. |
…e issue. Performance issue with previous delete statement is related to cockroachdb/cockroach#43198 Change-Id: I4a301526f209ef3fcaeca560d522c02aa28c20f8
…ormance issue. Performance issue with previous delete statement is related to cockroachdb/cockroach#43198 Change-Id: I4a301526f209ef3fcaeca560d522c02aa28c20f8
This change adds a rule that handles a case which prevents Exists subqueries from becoming lookup joins. Fixes cockroachdb#43198. Release note (performance improvement): certain queries containing `<tuple> IN (<subquery>)` conditions may run significantly faster.
63706: flowinfra: remove remnants of RunSyncFlow DistSQL RPC r=yuzefovich a=yuzefovich We recently removed `RunSyncFlow` RPC call of DistSQL, but a couple of things were missed. This commit cleans up the outbox based on that removal, removes `SetupSyncFlow` method as well as adds some comments. Release note: None 63780: opt: allow IN subquery to be converted to lookup join r=RaduBerinde a=RaduBerinde #### opt: add opttester facility to test placeholder assignment We like to assume that the result of building a memo with placeholders followed by AssignPlaceholders is equivalent to building the query with the values directly. This is not necessarily the case - it is possible that some normalization rules act on a higher part of the tree in a way that would not happen if we had fully normalized a lower part of the tree. This commit adds two new opttester directives: `assign-placeholders-norm` and `assign-placeholders-opt`. These take a query that has placeholders and simulates the prepared query planning path. We use these facilities to add some tests that reproduce a customer issue. Release note: None #### opt: allow IN subquery to be converted to lookup join This change adds a rule that handles a case which prevents Exists subqueries from becoming lookup joins. Fixes #43198. Release note (performance improvement): certain queries containing `<tuple> IN (<subquery>)` conditions may run significantly faster. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
This change adds a rule that handles a case which prevents Exists subqueries from becoming lookup joins. Fixes cockroachdb#43198. Release note (performance improvement): certain queries containing `<tuple> IN (<subquery>)` conditions may run significantly faster.
This change adds a rule that handles a case which prevents Exists subqueries from becoming lookup joins. Fixes cockroachdb#43198. Release note (performance improvement): certain queries containing `<tuple> IN (<subquery>)` conditions may run significantly faster.
This change adds a rule that handles a case which prevents Exists subqueries from becoming lookup joins. Fixes cockroachdb#43198. Release note (performance improvement): certain queries containing `<tuple> IN (<subquery>)` conditions may run significantly faster.
See example below. Best plan is a lookup join, but we don't get that with the semi-join. The reason is that each side is projecting a tuple, and we are constraining those to be equal. A rule that detects this projection and converts to multiple equalities should fix this.
The text was updated successfully, but these errors were encountered: