-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: handle nulls in selection operators #37281
Conversation
This still needs tests but otherwise is ready for a look. Also if the approach looks good I'll do the same for the projection operators in a follow-up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Definitely, having deterministic benchmarks is the right approach.
Reviewed 5 of 5 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
pkg/sql/exec/coldata/nulls.go, line 238 at r1 (raw file):
// Or returns a new Nulls vector where NullAt(i) iff n1.NullAt(i) or // n2.NullAt(i). func (n1 *Nulls) Or(n2 *Nulls) *Nulls {
There a few fast-path cases depending on the values of hasNulls
, I think it's worth writing out those explicitly (like both don't have nulls then we can avoid copying etc.).
pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 84 at r1 (raw file):
sel := batch.Selection() col1 = col1[:n] col2 = col2[:len(col1)]
[nit]: I'd use col2 = col2[:n]
(unless there is a performance reason not to).
pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 156 at r1 (raw file):
var idx uint16 if vec1.HasNulls() || vec2.HasNulls() { var nulls *coldata.Nulls
Oh, that's where the fast-path logic is. Hm, I still think it's worth having it in the Or
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/exec/coldata/nulls.go, line 238 at r1 (raw file):
Previously, yuzefovich wrote…
There a few fast-path cases depending on the values of
hasNulls
, I think it's worth writing out those explicitly (like both don't have nulls then we can avoid copying etc.).
Good point, will add.
pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 84 at r1 (raw file):
Previously, yuzefovich wrote…
[nit]: I'd use
col2 = col2[:n]
(unless there is a performance reason not to).
I know it looks silly but it eliminates a bounds check a few lines down. There will be a bounds check on col1[i]
but not col2[i]
since the compiler knows the slices are the same length. For some reason it doesn't infer that if you just slice them both to n
length.
pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 156 at r1 (raw file):
Previously, yuzefovich wrote…
Oh, that's where the fast-path logic is. Hm, I still think it's worth having it in the
Or
method.
Yeah, I agree it would be better to live there. We'll still need to do an explicit check here for the case where neither of them have nulls but that's fine.
0308558
to
45062a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
Blocked by #37358 which is causing the failing logic test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 84 at r1 (raw file):
Previously, solongordon (Solon) wrote…
I know it looks silly but it eliminates a bounds check a few lines down. There will be a bounds check on
col1[i]
but notcol2[i]
since the compiler knows the slices are the same length. For some reason it doesn't infer that if you just slice them both ton
length.
Got it, thanks.
pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 156 at r1 (raw file):
Previously, solongordon (Solon) wrote…
Yeah, I agree it would be better to live there. We'll still need to do an explicit check here for the case where neither of them have nulls but that's fine.
Makes sense.
Our selection operators now properly handle null values. This required a bit of extra templating. I also added a Nulls.Or implementation which does a bitwise or between two null bitmaps, which is convenient (and efficient) for combining two columns with nulls. Note that the implementation actually performs the selection before doing the null check. This is because I'm assuming that in general it's more likely that a value will be selected out due to the comparison than due to being null. This would not be true for queries with lots of nulls or a very permissive selection, but that seems like a minority of cases. I also updated the selection benchmarks to remove most of the randomness, which was contributing to unpredictable benchmarks. For instance, a very low constant value in the selLTInt64Int64ConstOp benchmark would cause significantly faster results than a very high one. Benchmarks: ``` BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=true-4 1000000 1256 ns/op 6517.93 MB/s BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=false-4 2000000 725 ns/op 11295.46 MB/s BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=true-4 1000000 1243 ns/op 6588.45 MB/s BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=false-4 3000000 491 ns/op 16663.51 MB/s BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=true-4 1000000 1603 ns/op 10216.66 MB/s BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=false-4 2000000 725 ns/op 22579.94 MB/s BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=true-4 1000000 1382 ns/op 11854.27 MB/s BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=false-4 3000000 542 ns/op 30181.99 MB/s ``` Refers cockroachdb#36931 Release note: None
45062a9
to
13c985e
Compare
bors r+ |
37281: exec: handle nulls in selection operators r=solongordon a=solongordon Our selection operators now properly handle null values. This required a bit of extra templating. I also added a Nulls.Or implementation which does a bitwise or between two null bitmaps, which is convenient (and efficient) for combining two columns with nulls. Note that the implementation actually performs the selection before doing the null check. This is because I'm assuming that in general it's more likely that a value will be selected out due to the comparison than due to being null. This would not be true for queries with lots of nulls or a very permissive selection, but that seems like a minority of cases. I also updated the selection benchmarks to remove most of the randomness, which was contributing to unpredictable benchmarks. For instance, a very low constant value in the selLTInt64Int64ConstOp benchmark would cause significantly faster results than a very high one. Benchmarks: ``` BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=true-4 1000000 1256 ns/op 6517.93 MB/s BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=false-4 2000000 725 ns/op 11295.46 MB/s BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=true-4 1000000 1243 ns/op 6588.45 MB/s BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=false-4 3000000 491 ns/op 16663.51 MB/s BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=true-4 1000000 1603 ns/op 10216.66 MB/s BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=false-4 2000000 725 ns/op 22579.94 MB/s BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=true-4 1000000 1382 ns/op 11854.27 MB/s BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=false-4 3000000 542 ns/op 30181.99 MB/s ``` Refers #36931 Release note: None Co-authored-by: Solon Gordon <[email protected]>
Build succeeded |
Our selection operators now properly handle null values. This required a
bit of extra templating. I also added a Nulls.Or implementation which
does a bitwise or between two null bitmaps, which is convenient (and
efficient) for combining two columns with nulls.
Note that the implementation actually performs the selection before
doing the null check. This is because I'm assuming that in general it's
more likely that a value will be selected out due to the comparison than
due to being null. This would not be true for queries with lots of nulls
or a very permissive selection, but that seems like a minority of cases.
I also updated the selection benchmarks to remove most of the
randomness, which was contributing to unpredictable benchmarks. For
instance, a very low constant value in the selLTInt64Int64ConstOp
benchmark would cause significantly faster results than a very high one.
Benchmarks:
Refers #36931
Release note: None