Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: handle nulls in selection operators #37281

Merged
merged 1 commit into from
May 9, 2019

Conversation

solongordon
Copy link
Contributor

Our selection operators now properly handle null values. This required a
bit of extra templating. I also added a Nulls.Or implementation which
does a bitwise or between two null bitmaps, which is convenient (and
efficient) for combining two columns with nulls.

Note that the implementation actually performs the selection before
doing the null check. This is because I'm assuming that in general it's
more likely that a value will be selected out due to the comparison than
due to being null. This would not be true for queries with lots of nulls
or a very permissive selection, but that seems like a minority of cases.

I also updated the selection benchmarks to remove most of the
randomness, which was contributing to unpredictable benchmarks. For
instance, a very low constant value in the selLTInt64Int64ConstOp
benchmark would cause significantly faster results than a very high one.

Benchmarks:

BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=true-4         	 1000000	      1256 ns/op	6517.93 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=false-4        	 2000000	       725 ns/op	11295.46 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=true-4        	 1000000	      1243 ns/op	6588.45 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=false-4       	 3000000	       491 ns/op	16663.51 MB/s
BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=true-4              	 1000000	      1603 ns/op	10216.66 MB/s
BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=false-4             	 2000000	       725 ns/op	22579.94 MB/s
BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=true-4             	 1000000	      1382 ns/op	11854.27 MB/s
BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=false-4            	 3000000	       542 ns/op	30181.99 MB/s

Refers #36931

Release note: None

@solongordon solongordon requested a review from a team May 2, 2019 21:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@solongordon
Copy link
Contributor Author

This still needs tests but otherwise is ready for a look. Also if the approach looks good I'll do the same for the projection operators in a follow-up PR.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Definitely, having deterministic benchmarks is the right approach.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)


pkg/sql/exec/coldata/nulls.go, line 238 at r1 (raw file):

// Or returns a new Nulls vector where NullAt(i) iff n1.NullAt(i) or
// n2.NullAt(i).
func (n1 *Nulls) Or(n2 *Nulls) *Nulls {

There a few fast-path cases depending on the values of hasNulls, I think it's worth writing out those explicitly (like both don't have nulls then we can avoid copying etc.).


pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 84 at r1 (raw file):

	sel := batch.Selection()
	col1 = col1[:n]
	col2 = col2[:len(col1)]

[nit]: I'd use col2 = col2[:n] (unless there is a performance reason not to).


pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 156 at r1 (raw file):

		var idx uint16
		if vec1.HasNulls() || vec2.HasNulls() {
			var nulls *coldata.Nulls

Oh, that's where the fast-path logic is. Hm, I still think it's worth having it in the Or method.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/exec/coldata/nulls.go, line 238 at r1 (raw file):

Previously, yuzefovich wrote…

There a few fast-path cases depending on the values of hasNulls, I think it's worth writing out those explicitly (like both don't have nulls then we can avoid copying etc.).

Good point, will add.


pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 84 at r1 (raw file):

Previously, yuzefovich wrote…

[nit]: I'd use col2 = col2[:n] (unless there is a performance reason not to).

I know it looks silly but it eliminates a bounds check a few lines down. There will be a bounds check on col1[i] but not col2[i] since the compiler knows the slices are the same length. For some reason it doesn't infer that if you just slice them both to n length.


pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 156 at r1 (raw file):

Previously, yuzefovich wrote…

Oh, that's where the fast-path logic is. Hm, I still think it's worth having it in the Or method.

Yeah, I agree it would be better to live there. We'll still need to do an explicit check here for the case where neither of them have nulls but that's fine.

@solongordon solongordon force-pushed the selection-nulls branch 3 times, most recently from 0308558 to 45062a9 Compare May 6, 2019 22:01
Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

@solongordon
Copy link
Contributor Author

Blocked by #37358 which is causing the failing logic test

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)


pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 84 at r1 (raw file):

Previously, solongordon (Solon) wrote…

I know it looks silly but it eliminates a bounds check a few lines down. There will be a bounds check on col1[i] but not col2[i] since the compiler knows the slices are the same length. For some reason it doesn't infer that if you just slice them both to n length.

Got it, thanks.


pkg/sql/exec/execgen/cmd/execgen/selection_ops_gen.go, line 156 at r1 (raw file):

Previously, solongordon (Solon) wrote…

Yeah, I agree it would be better to live there. We'll still need to do an explicit check here for the case where neither of them have nulls but that's fine.

Makes sense.

Our selection operators now properly handle null values. This required a
bit of extra templating. I also added a Nulls.Or implementation which
does a bitwise or between two null bitmaps, which is convenient (and
efficient) for combining two columns with nulls.

Note that the implementation actually performs the selection before
doing the null check. This is because I'm assuming that in general it's
more likely that a value will be selected out due to the comparison than
due to being null. This would not be true for queries with lots of nulls
or a very permissive selection, but that seems like a minority of cases.

I also updated the selection benchmarks to remove most of the
randomness, which was contributing to unpredictable benchmarks. For
instance, a very low constant value in the selLTInt64Int64ConstOp
benchmark would cause significantly faster results than a very high one.

Benchmarks:
```
BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=true-4         	 1000000	      1256 ns/op	6517.93 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=false-4        	 2000000	       725 ns/op	11295.46 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=true-4        	 1000000	      1243 ns/op	6588.45 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=false-4       	 3000000	       491 ns/op	16663.51 MB/s
BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=true-4              	 1000000	      1603 ns/op	10216.66 MB/s
BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=false-4             	 2000000	       725 ns/op	22579.94 MB/s
BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=true-4             	 1000000	      1382 ns/op	11854.27 MB/s
BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=false-4            	 3000000	       542 ns/op	30181.99 MB/s
```

Refers cockroachdb#36931

Release note: None
@solongordon
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 9, 2019
37281: exec: handle nulls in selection operators r=solongordon a=solongordon

Our selection operators now properly handle null values. This required a
bit of extra templating. I also added a Nulls.Or implementation which
does a bitwise or between two null bitmaps, which is convenient (and
efficient) for combining two columns with nulls.

Note that the implementation actually performs the selection before
doing the null check. This is because I'm assuming that in general it's
more likely that a value will be selected out due to the comparison than
due to being null. This would not be true for queries with lots of nulls
or a very permissive selection, but that seems like a minority of cases.

I also updated the selection benchmarks to remove most of the
randomness, which was contributing to unpredictable benchmarks. For
instance, a very low constant value in the selLTInt64Int64ConstOp
benchmark would cause significantly faster results than a very high one.

Benchmarks:
```
BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=true-4         	 1000000	      1256 ns/op	6517.93 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=true,hasNulls=false-4        	 2000000	       725 ns/op	11295.46 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=true-4        	 1000000	      1243 ns/op	6588.45 MB/s
BenchmarkSelLTInt64Int64ConstOp/useSel=false,hasNulls=false-4       	 3000000	       491 ns/op	16663.51 MB/s
BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=true-4              	 1000000	      1603 ns/op	10216.66 MB/s
BenchmarkSelLTInt64Int64Op/useSel=true,hasNulls=false-4             	 2000000	       725 ns/op	22579.94 MB/s
BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=true-4             	 1000000	      1382 ns/op	11854.27 MB/s
BenchmarkSelLTInt64Int64Op/useSel=false,hasNulls=false-4            	 3000000	       542 ns/op	30181.99 MB/s
```

Refers #36931

Release note: None

Co-authored-by: Solon Gordon <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 9, 2019

Build succeeded

@craig craig bot merged commit 13c985e into cockroachdb:master May 9, 2019
@solongordon solongordon deleted the selection-nulls branch May 10, 2019 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants