-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colfetcher: use the limit hint to size the batch #63020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, when would a limit hint be available and no estimate? Is this expected, and should we fix in the optimizer too?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colfetcher/cfetcher.go, line 1073 at r1 (raw file):
var emitBatch bool if rf.machine.rowIdx == rf.machine.batch.Capacity() || (rf.machine.limitHint > 0 && rf.machine.rowIdx == rf.machine.limitHint) {
nit: I see you changed the "inaccurate" >= to == in both places. I was taught as a young programmer to prefer using >= for conditions like this because it limits the effects of other broken code in the system. I agree that I can see no reason why the index would exceed the capacity, and indeed many things worse could go wrong, but I think it's a good habit to keep a >= to "limit the blast radius" in case other bugs creep in.
I think such scenario can arise when there are no stats on the table, but the query has a LIMIT. |
4eeeedb
to
8e39f10
Compare
The benchmarks of this change:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/colfetcher/cfetcher.go, line 1073 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
nit: I see you changed the "inaccurate" >= to == in both places. I was taught as a young programmer to prefer using >= for conditions like this because it limits the effects of other broken code in the system. I agree that I can see no reason why the index would exceed the capacity, and indeed many things worse could go wrong, but I think it's a good habit to keep a >= to "limit the blast radius" in case other bugs creep in.
Fair point, restored.
8e39f10
to
c012099
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colfetcher/cfetcher.go, line 682 at r2 (raw file):
rf.machine.limitHint = int(limitHint) rf.machine.state[0] = stateResetBatch rf.machine.state[1] = stateInitFetch
Why the change to have the state machine do the reset? I'm okay with it but just curious.
pkg/sql/colfetcher/cfetcher.go, line 1087 at r2 (raw file):
// Note that limitHint might become negative at which point we // will start ignoring it. rf.machine.limitHint -= rf.machine.rowIdx
This could conceivably overflow if we did a int63-sized scan, which I guess is not really possible, but why bother continuing to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/colfetcher/cfetcher.go, line 682 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Why the change to have the state machine do the reset? I'm okay with it but just curious.
I included the reasoning in the commit message. Without this (or some other) change, we call resetBatch
in cFetcher.Init
which is called in initCRowFetcher
which is called in NewColBatchScan
; but at that time the limit hint is not yet set - that is done in cFetcher.StartScan
which is called in ColBatchScan.Init
; thus, we could allocate a batch of capacity 1 first, and only then we would use the limit hint to size the batch.
pkg/sql/colfetcher/cfetcher.go, line 1087 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
This could conceivably overflow if we did a int63-sized scan, which I guess is not really possible, but why bother continuing to track this?
My thinking was that unconditional subtraction is faster than having a conditional in place, and the code ends being cleaner overall. Yeah, I'm not worried about the underflowing, but happy to change this code if you see a clean way to implement it. I came up with something like
if rf.machine.limitHint > 0 {
// Update the limit hint to track the expected remaining rows to
// be fetched.
if rf.machine.limitHint >= rf.machine.rowIdx {
rf.machine.limitHint -= rf.machine.rowIdx
} else {
rf.machine.limitHint = 0
}
}
We recently merged a couple of commits to take advantage of the optimizer-driven estimated row count that will be output by the cfetcher and of the limit hint when available. However, one scenario was overlooked - if there is no estimate but the limit hint is available, we don't use the latter to size the original batch, and now we do. As an additional optimization this commit refactors the initial state of the state machine so that we didn't allocate the output batch in the constructor of the ColBatchScan but do it only on the very first call to `NextBatch`. This was prompted by the fact that the limit hint is currently only set in `StartScan` which is called in `ColBatchScan.Init`, so without some changes, in the case when the limit hint can be used to size the batch we would allocate a redundant batch of size one in `NewColBatchScan`. I also think that the new initial state of the state machine makes a bit more sense to me. Additionally, this commit generalizes the logic to keep track of the limit hint as "remaining rows to be fetched." Release note: None (this is a follow-up to a recently merged commit that had a release note)
c012099
to
1505e48
Compare
I think we'll want to backport this to 21.1.0, so I'll go ahead and merge. TFTR! bors r+ |
Build succeeded: |
pkg/sql/colfetcher/cfetcher.go, line 328 at r2 (raw file):
Sorry for the late drive-by, but in a future commit you could consider adding another case for soft limit as follows:
Not sure if this is always desirable, but could improve performance in some cases. |
pkg/sql/colfetcher/cfetcher.go, line 328 at r2 (raw file): Previously, rytaft (Rebecca Taft) wrote…
(also, not sure if soft limit is actually used to set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/colfetcher/cfetcher.go, line 328 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
(also, not sure if soft limit is actually used to set
rf.machine.limitHint
, that would be a prerequisite for this change...)
Thanks! I'll open up a PR to address this.
We recently merged a couple of commits to take advantage of the
optimizer-driven estimated row count that will be output by the cfetcher
and of the limit hint when available. However, one scenario was
overlooked - if there is no estimate but the limit hint is available, we
don't use the latter to size the original batch, and now we do.
As an additional optimization this commit refactors the initial state of
the state machine so that we didn't allocate the output batch in the
constructor of the ColBatchScan but do it only on the very first call
to
NextBatch
. This was prompted by the fact that the limit hint iscurrently only set in
StartScan
which is called inColBatchScan.Init
, so without some changes, in the case when the limithint can be used to size the batch we would allocate a redundant batch
of size one in
NewColBatchScan
. I also think that the new initialstate of the state machine makes a bit more sense to me.
Additionally, this commit generalizes the logic to keep track of the
limit hint as "remaining rows to be fetched."
Addresses: #62212.
Release note: None (this is a follow-up to a recently merged commit that
had a release note)