Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: add native support for COALESCE, IF, NULLIF #77658

Merged
merged 3 commits into from
Mar 15, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Mar 11, 2022

colexec: remove no longer used testing knob

Previously, in a test helper method we allowed the fallback to the
row-by-row engine. At some point this was needed, but right now all
callers don't utilize that ability, so we can easily remove it.

Release note: None

Release justification: testing only change.

colbuilder: order expressions lexicographically

Release note: None

Release justification: low risk cleanup.

colexec: add native support for COALESCE, IF, NULLIF

This commit adds the native vectorized support for CoalesceExpr,
IfExpr, and NullIfExpr by planning the equivalent CASE expressions.
Namely, for CoalesceExpr we do

CASE
  WHEN CoalesceExpr.Exprs[0] IS DISTINCT FROM NULL THEN CoalesceExpr.Exprs[0]
  WHEN CoalesceExpr.Exprs[1] IS DISTINCT FROM NULL THEN CoalesceExpr.Exprs[1]
  ...
END

for IfExpr we do

CASE WHEN IfExpr.Cond THEN IfExpr.True ELSE IfExpr.Else END

and for NullIfExpr we do

CASE WHEN Expr1 == Expr2 THEN NULL ELSE Expr1 END

This commit additionally introduces some unit tests for these newly
supported expressions while extracting out some testing facilities from
the caseOp tests.

Fixes: #66015.

Release note: None

Release justification: low risk, high benefit change to existing
functionality.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Previously, in a test helper method we allowed the fallback to the
row-by-row engine. At some point this was needed, but right now all
callers don't utilize that ability, so we can easily remove it.

Release note: None

Release justification: testing only change.
@yuzefovich yuzefovich force-pushed the coalesce branch 3 times, most recently from c2a8d89 to c4273c7 Compare March 14, 2022 17:41
@yuzefovich yuzefovich changed the title WIP on CoalesceExpr and IfExpr colexec: add native support for IfExpr and CoalesceExpr Mar 14, 2022
@yuzefovich yuzefovich marked this pull request as ready for review March 14, 2022 18:48
@yuzefovich yuzefovich requested review from michae2, rharding6373 and a team March 14, 2022 18:48
@yuzefovich
Copy link
Member Author

Given that this PR doesn't introduce any new operators, I think it's safe to merge during the stability, so I'd really like to get this in.

The insight of converting these expressions to the equivalent CASE expressions came to me after I already implemented them explicitly here :) The explicit implementation might be a bit faster, but I don't think it's worth it.

@yuzefovich yuzefovich changed the title colexec: add native support for IfExpr and CoalesceExpr colexec: add native support for COALESCE, IF, NULLIF Mar 14, 2022
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to implement these transformations as normalization rules in the optimizer. The rules for IF and NULLIF would be fairly simple.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code :lgtm: , but agree with Marcus. If it's always better to model these as case statements, it would be better to apply them earlier in the pipeline as normalization rules so that we have more visibility into the operators that are actually used and can apply optimizations to them.

Reviewed 10 of 10 files at r1, 1 of 1 files at r2, 6 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2, @rharding6373, and @yuzefovich)


pkg/sql/colexec/proj_utils_test.go, line 1 at r3 (raw file):

// Copyright 2022 The Cockroach Authors.

nit on the file name: I think something like test_utils.go would be more fitting, since this isn't defining actual tests.

Release note: None

Release justification: low risk cleanup.
This commit adds the native vectorized support for `CoalesceExpr`,
`IfExpr`, and `NullIfExpr` by planning the equivalent CASE expressions.
Namely, for `CoalesceExpr` we do
```
CASE
  WHEN CoalesceExpr.Exprs[0] IS DISTINCT FROM NULL THEN CoalesceExpr.Exprs[0]
  WHEN CoalesceExpr.Exprs[1] IS DISTINCT FROM NULL THEN CoalesceExpr.Exprs[1]
  ...
END
```
for `IfExpr` we do
```
CASE WHEN IfExpr.Cond THEN IfExpr.True ELSE IfExpr.Else END
```
and for `NullIfExpr` we do
```
CASE WHEN Expr1 == Expr2 THEN NULL ELSE Expr1 END
```

This commit additionally introduces some unit tests for these newly
supported expressions while extracting out some testing facilities from
the caseOp tests.

Release note: None

Release justification: low risk, high benefit change to existing
functionality.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, and I think we should it. We already have a precedent in the vectorized planning of doing this conversion, so I think it's ok to introduce some more conversions there; thus, I filed #77799 to track this idea.

I'd really like to get the Coalesce support in for 22.1, so unless someone volunteers to implement this before the branch is cut (which I believe is tomorrow), I'll go ahead and merge as is :)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @rharding6373)


pkg/sql/colexec/proj_utils_test.go, line 1 at r3 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

nit on the file name: I think something like test_utils.go would be more fitting, since this isn't defining actual tests.

There are some subtle differences:

  • .*_test.go is test-only file that is not included into the compiled binary whereas .*_test_utils.go might be included into the binary.
  • .*_test_utils.go will be visible from the non-test code;

so I think it's a good hygiene to put test-only code exclusively (when possible) in .*_test.go files.

@mgartner
Copy link
Collaborator

I can try to write the rules, but probably won't make the branch cut. If the normalization rules are added later, would we keep this conversion code you're adding or remove it?

@yuzefovich
Copy link
Member Author

If the normalization rules are added later, would we keep this conversion code you're adding or remove it?

The conversion code in execplan.go then will become obsolete, so we would remove it, we'd keep the unit tests though.

@yuzefovich
Copy link
Member Author

I'll merge this, and we'll remove the vectorized planning code whenever the normalization rules are added.

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 15, 2022

Build succeeded:

@craig craig bot merged commit eed2f30 into cockroachdb:master Mar 15, 2022
@yuzefovich yuzefovich deleted the coalesce branch March 15, 2022 00:55
@mgartner
Copy link
Collaborator

Are you sure the changes for IF and NULLIF were required? We already build those expressions directly into CASE expressions in optbuilder.

case *tree.NullIfExpr:
valType := t.ResolvedType()
// Ensure that the type of the first expression matches the resolved type
// of the NULLIF expression so that type inference will be correct in the
// CASE expression constructed below. For example, the type of
// NULLIF(NULL, 0) should be int.
expr1 := reType(t.Expr1.(tree.TypedExpr), valType)
input := b.buildScalar(expr1, inScope, nil, nil, colRefs)
cond := b.buildScalar(t.Expr2.(tree.TypedExpr), inScope, nil, nil, colRefs)
whens := memo.ScalarListExpr{
b.factory.ConstructWhen(cond, b.factory.ConstructNull(valType)),
}
out = b.factory.ConstructCase(input, whens, input)

case *tree.IfExpr:
valType := t.ResolvedType()
input := b.buildScalar(t.Cond.(tree.TypedExpr), inScope, nil, nil, colRefs)
// Re-typing the True expression should always succeed because they
// are given the same type during type-checking.
ifTrueExpr := reType(t.True.(tree.TypedExpr), valType)
ifTrue := b.buildScalar(ifTrueExpr, inScope, nil, nil, colRefs)
whens := memo.ScalarListExpr{b.factory.ConstructWhen(memo.TrueSingleton, ifTrue)}
orElseExpr, ok := tree.ReType(t.Else.(tree.TypedExpr), valType)
if !ok {
panic(pgerror.Newf(
pgcode.DatatypeMismatch,
"IF types %s and %s cannot be matched",
t.Else.(tree.TypedExpr).ResolvedType(), valType,
))
}
orElse := b.buildScalar(orElseExpr, inScope, nil, nil, colRefs)
out = b.factory.ConstructCase(input, whens, orElse)

defaultdb> CREATE TABLE t (a INT);
CREATE TABLE

defaultdb> EXPLAIN (VERBOSE) SELECT IF(a = 1, 'foo', 'bar') FROM t;
                             info
---------------------------------------------------------------
  distribution: full
  vectorized: true

  • render
  │ columns: ("if")
  │ estimated row count: 1,000 (missing stats)
  │ render if: CASE a = 1 WHEN true THEN 'foo' ELSE 'bar' END
  │
  └── • scan
        columns: (a)
        estimated row count: 1,000 (missing stats)
        table: t@primary
        spans: FULL SCAN
(13 rows)

defaultdb> EXPLAIN (VERBOSE) SELECT NULLIF(a, a) FROM t;
                                        info
------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • render
  │ columns: ("nullif")
  │ estimated row count: 1
  │ render nullif: CASE a WHEN a THEN CAST(NULL AS INT8) ELSE a END
  │
  └── • scan
        columns: (a)
        estimated row count: 1 (100% of the table; stats collected 32 seconds ago)
        table: t@primary
        spans: FULL SCAN
(13 rows)

@yuzefovich
Copy link
Member Author

Interesting - no, I didn't check any end-to-end queries.

The reason I wanted to add the support for IF was in order to use it in the randomized unit test of COALESCE, and that code path doesn't go through the optimizer, so it's no surprise IfExpr wasn't supported. So the explicit vec planning of IF still makes sense to have that unit test working, and I added NULLIF just because I assumed it was similar and simple, but looks like NULLIF is just a little bit of dead code since I didn't use it in any unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: vectorize COALESCE expressions
4 participants