-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: only infer self-join equality with a key over the base table #106485
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just need to fix some tests.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/memo/logical_props_builder.go
line 2553 at r1 (raw file):
} for leftTable, leftTableOrds := range leftTables { var baseTabFDs *props.FuncDepSet
The "resetting" of baseTabFDs
for every iteration of the outer loop is subtle and might be easy for someone to mess up in the future. Consider adding a comment to highlight this or a test that will fail if it's not reset.
pkg/sql/logictest/testdata/logic_test/join
line 1210 at r1 (raw file):
# Regression test for #106371 - only infer self-join equalities if the original # equality columns form a key over the *base* table, not just the join inputs.
Can you simplify this to the example in the commit message, or does it the reproduction require all this?
pkg/sql/logictest/testdata/logic_test/join
line 1234 at r1 (raw file):
statement ok CREATE TABLE queries_item_tags (
nit: remove trailing spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/memo/logical_props_builder.go
line 2553 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The "resetting" of
baseTabFDs
for every iteration of the outer loop is subtle and might be easy for someone to mess up in the future. Consider adding a comment to highlight this or a test that will fail if it's not reset.
Since MakeTableFuncDep
caches the FDs anyway, I've just changed it to unconditionally initialize baseTabFDs
in the outer loop, and leave it untouched in the inner loop.
pkg/sql/logictest/testdata/logic_test/join
line 1210 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can you simplify this to the example in the commit message, or does it the reproduction require all this?
Good point, a modified version of the commit example will work for a logic test. Done.
pkg/sql/logictest/testdata/logic_test/join
line 1234 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: remove trailing spaces
Replaced the test case as mentioned above.
5ec29b2
to
7053d2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/memo/logical_props_builder.go
line 2553 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Since
MakeTableFuncDep
caches the FDs anyway, I've just changed it to unconditionally initializebaseTabFDs
in the outer loop, and leave it untouched in the inner loop.
Nice solution!
pkg/sql/opt/memo/testdata/logprops/join
line 3010 at r2 (raw file):
# from each input. norm SELECT * FROM xyz INNER JOIN xyz foo ON xyz.x = foo.x AND xyz.y = 1 AND foo.y = 2
We already had a test for this, but none of us realized the output was invalid... 🤷
Self-join equality inference was added by cockroachdb#105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL); INSERT INTO t106371 VALUES (1, 1), (1, 2); SELECT * FROM t106371 a JOIN t106371 b ON a.x = b.x; SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, one input has one row and the other de-duplicated by the `x` column and so `x` is a key over both inputs. However, the query as written will select different rows for each input - `a` will return the `(1, 2)` row, while `b` will return the `(1, 1)` row. Inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes cockroachdb#106371 Release note: None
7053d2f
to
1da6390
Compare
TFTR! bors r+ |
Timed out. |
bors retry |
Looks like the bors failure was transient, due to something in the test infra. |
Build succeeded: |
opt: only infer self-join equality with a key over the base table
Self-join equality inference was added by #105214, so that the
FuncDeps
for a self-join would include equalities between every pair of columns
at the same ordinal position in the base table if there was an equality
between key columns (also at the same ordinal position). However, the
key column check was performed using the FDs of the join inputs rather
than the base table's FDs. This could lead to incorrectly adding self-join
equality filters. For example, consider the following:
In the first query above,
a.x = b.x
does not consitute joining onkey columns. But in the second query, one input has one row and the
other de-duplicated by the
x
column and sox
is a key over bothinputs. However, the query as written will select different rows for
each input -
a
will return the(1, 2)
row, whileb
will returnthe
(1, 1)
row. Inferring aa.y = b.y
filter will incorrectly causethe join to return no rows.
This patch fixes the problem by requiring the initial self-join
equalities to form a key over the base table, not just the inputs
of the join.
Fixes #106371
Release note: None