-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, opt: support hint to disallow zigzag join, support bounded staleness checks #68141
Conversation
d9e95dd
to
9d1f34b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! The main question I have is whether the interaction with LIMITs here matches the KV semantics we need for these queries, which is that they are only ever allowed to touch a single range.
Reviewed 11 of 11 files at r1, 10 of 10 files at r2, 4 of 4 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @mgartner, @otan, and @rytaft)
pkg/ccl/logictestccl/testdata/logic_test/as_of, line 214 at r3 (raw file):
# Scan is limited to one row, so it succeeds. statement ok SELECT * FROM t AS OF SYSTEM TIME with_max_staleness('1ms') WHERE k IS NULL LIMIT 1
This one is interesting. From KV's perspective, this is a multi-row scan with a limit. That means that the scan can span multiple ranges, but we expect it to short-circuit once it hits the first row. In practice, we expect that to very often be in the first range we hit, but there's no guarantee of that - we could have empty ranges. And more fundamentally, KV is going to see this and notice that the scan crosses range boundaries even before considering the limit, so it will reject the bounded staleness attempt here because it can't hit the single-range fast-path. So because of that, I'm not sure we'll be able to support it.
pkg/sql/opt/memo/expr.go, line 364 at r1 (raw file):
// Empty returns true if there are no flags set. func (sf *ScanFlags) Empty() bool { return !sf.NoIndexJoin && !sf.ForceIndex && !sf.NoZigzagJoin
minor nit: Should we keep these conditions in the same order as the fields?
pkg/sql/opt/memo/expr_format.go, line 418 at r1 (raw file):
b.WriteString(" no-zigzag-join") } tp.Childf("flags:%s", b.String())
Should we b.WriteString("flags:")
up above and then use (Node).Child
to avid the extra string formatting in (Node).Childf
?
Also, do we want this if there are no flags? I believe that was the behavior before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r1, 10 of 10 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @nvanbenschoten, @otan, and @rytaft)
pkg/ccl/logictestccl/testdata/logic_test/as_of, line 214 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This one is interesting. From KV's perspective, this is a multi-row scan with a limit. That means that the scan can span multiple ranges, but we expect it to short-circuit once it hits the first row. In practice, we expect that to very often be in the first range we hit, but there's no guarantee of that - we could have empty ranges. And more fundamentally, KV is going to see this and notice that the scan crosses range boundaries even before considering the limit, so it will reject the bounded staleness attempt here because it can't hit the single-range fast-path. So because of that, I'm not sure we'll be able to support it.
Your comment brought up some concerns regarding column families. From KV's perspective is a scan of a single row that has multiple column families considered a multi-row scan? Are all column families of a row guaranteed to be in the same range? Is it worthwhile to have test cases with multiple column families here?
pkg/sql/opt/memo/expr_format.go, line 418 at r1 (raw file):
Also, do we want this if there are no flags? I believe that was the behavior before.
I think the if !private.Flags.Empty() { .. }
handles that.
pkg/sql/opt/xform/select_funcs.go, line 921 at r1 (raw file):
} if scanPrivate.Flags.NoZigzagJoin {
nit: you could combine this conditional with the one above
pkg/sql/opt/xform/select_funcs.go, line 1281 at r1 (raw file):
} if scanPrivate.Flags.NoZigzagJoin {
nit: you could combine with above
pkg/ccl/logictestccl/testdata/logic_test/as_of, line 214 at r3 (raw file):
Disregard this - this should be covered by column family randomization. |
9d1f34b
to
d77e90f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Fixed the LIMIT issue.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @mgartner, @nvanbenschoten, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/as_of, line 214 at r3 (raw file):
So because of that, I'm not sure we'll be able to support it.
Good catch! This now returns an error.
From KV's perspective is a scan of a single row that has multiple column families considered a multi-row scan?
This is an important question, and I don't know the answer. @nvanbenschoten?
pkg/sql/opt/memo/expr.go, line 364 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
minor nit: Should we keep these conditions in the same order as the fields?
Done.
pkg/sql/opt/memo/expr_format.go, line 418 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Also, do we want this if there are no flags? I believe that was the behavior before.
I think the
if !private.Flags.Empty() { .. }
handles that.
Done. (and @mgartner is right that !private.Flags.Empty()
takes care of the second point)
pkg/sql/opt/xform/select_funcs.go, line 921 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: you could combine this conditional with the one above
Done.
pkg/sql/opt/xform/select_funcs.go, line 1281 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: you could combine with above
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 16 files at r4, 8 of 10 files at r5, 4 of 4 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @mgartner, @otan, and @rytaft)
pkg/ccl/logictestccl/testdata/logic_test/as_of, line 214 at r3 (raw file):
From KV's perspective is a scan of a single row that has multiple column families considered a multi-row scan? Are all column families of a row guaranteed to be in the same range?
Good question. All keys for a row are stored in a single range, so point lookups for rows with multiple column families are allowed in this first stage of bounded staleness reads. If you're interested, this is all enforced in keys.EnsureSafeSplitKey
.
Is it worthwhile to have test cases with multiple column families here?
Yes, this is a good idea.
pkg/sql/opt/memo/expr.go, line 357 at r6 (raw file):
// ForceIndex forces the use of a specific index (specified in Index). // ForceIndex and NoIndexJoin cannot both be set at the same time. ForceIndex bool
Are NoZigzagJoin
and ForceIndex
also mutually exclusive?
pkg/sql/opt/memo/expr_format.go, line 404 at r6 (raw file):
if private.Flags.NoIndexJoin { b.WriteString(" no-index-join") } else if private.Flags.ForceIndex {
nit: is it worth baking in the assumption that NoIndexJoin
and ForceIndex
aren't both set here? That seems like the kind of constraint that we're better off not relying on for correctness in a formatting function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 11 files at r1, 16 of 16 files at r4, 10 of 10 files at r5, 1 of 4 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @rytaft)
pkg/sql/opt/memo/expr_format.go, line 417 at r4 (raw file):
} if private.Flags.NoZigzagJoin { b.WriteString(" no-zigzag-join")
Can you have a NoIndexJoin hint and a NoZigzagJoin hint? I guess it makes sense, just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't even bother looking at the bounded staleness stuff.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @rytaft)
pkg/sql/opt/memo/expr.go, line 357 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't think so. It's valid to want to scan one index but not perform a zigzag join. I don't see anything in the implementation of these flags that would prevent that. |
Release note (sql change): Added support for a new index hint, NO_ZIGZAG_JOIN, which will prevent the optimizer from planning a zigzag join for the specified table. The hint can be used in the same way as other existing index hints. For example, `SELECT * FROM table_name@{NO_ZIGZAG_JOIN};`.
This commit updates the execbuilder to include the tree.SemaContext as one of its arguments. This will allow it to use the AsOfSystemTime information in a following commit. Release note: None
This commit adds checks in the execbuilder to ensure that bounded staleness can only be used with queries that touch at most one row. It also applies hints for such queries in the optbuilder to ensure that an invalid plan is not produced. In particular, the hints ensure that no plans with an index join or zigzag join will be produced. Fixes cockroachdb#67558 Release note: None
d77e90f
to
a08e6fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach, @nvanbenschoten, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/as_of, line 214 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
From KV's perspective is a scan of a single row that has multiple column families considered a multi-row scan? Are all column families of a row guaranteed to be in the same range?
Good question. All keys for a row are stored in a single range, so point lookups for rows with multiple column families are allowed in this first stage of bounded staleness reads. If you're interested, this is all enforced in
keys.EnsureSafeSplitKey
.Is it worthwhile to have test cases with multiple column families here?
Yes, this is a good idea.
Ah that's great news. Column family randomization should cover the tests as @mgartner mentioned.
pkg/sql/opt/memo/expr.go, line 357 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I don't think so. It's valid to want to scan one index but not perform a zigzag join. I don't see anything in the implementation of these flags that would prevent that.
^ what he said
pkg/sql/opt/memo/expr_format.go, line 417 at r4 (raw file):
Previously, cucaroach (Tommy Reilly) wrote…
Can you have a NoIndexJoin hint and a NoZigzagJoin hint? I guess it makes sense, just checking.
yea, you can have them together.
pkg/sql/opt/memo/expr_format.go, line 404 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: is it worth baking in the assumption that
NoIndexJoin
andForceIndex
aren't both set here? That seems like the kind of constraint that we're better off not relying on for correctness in a formatting function.
Good point. Fixed.
I'm going to go ahead and merge this. Sorry for the incoming rebase, @cucaroach! bors r+ |
haha I was about to move AsOfSystemTime to the EvalContext (#68219) which avoid your second commit. Oh well. |
bors r- be my guest! |
Canceled. |
No sweat, who ever commits first wins! |
im fine if you merge this, or pull the commit out for your PR. i was going to make it the same PR on top of nathan's PR, which is a while away and i don't want to block this! |
Ok, in that case I think I'll just merge this since I don't have much time today and I'm going on vacation tomorrow. I can easily revert the changes from my second commit once you add the AOST to EvalContext. Thanks! bors r+ |
Gah changed my mind since I see @otan 's PR is basically ready to go. bors r- |
Canceled. |
Going with the original plan. Will merge now and revert as needed later. bors r+ |
Build succeeded: |
Well that was an emotional rollercoaster... It's great to show bors that it's not the only one that can sensationalize a merge! 😉 |
😂 Sorry for all the noise! |
opt,sql: support hint to disallow zigzag join
Release note (sql change): Added support for a new index hint,
NO_ZIGZAG_JOIN
, which will prevent the optimizer from planning azigzag join for the specified table. The hint can be used in the
same way as other existing index hints. For example,
SELECT * FROM table_name@{NO_ZIGZAG_JOIN};
.sql,opt: pass tree.SemaContext to the execbuilder
This commit updates the execbuilder to include the
tree.SemaContext
as one of its arguments. This will allow it to use the
AsOfSystemTime
information in a following commit.
Release note: None
opt: add execbuilder checks for bounded staleness queries
This commit adds checks in the execbuilder to ensure that bounded staleness
can only be used with queries that touch at most one row. It also applies
hints for such queries in the optbuilder to ensure that an invalid plan is
not produced. In particular, the hints ensure that no plans with an index
join or zigzag join will be produced.
Fixes #67558
Release note: None