Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colbuilder: optimize IS DISTINCT FROM NULL when null is casted #63802

Merged
merged 1 commit into from
Apr 21, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Apr 16, 2021

We have an optimized operator for Is{Not}DistinctFrom operation which
we can plan currently only if the right side is a constant NULL. In some
cases the optimizer might create a cast expression on the right in order
to propagate the type of the null, and previously we would fallback to
the default comparison operator in such scenario. This is suboptimal,
and this commit fixes the issue by special casing the scenario of
casting NULL to some type.

Fixes: #63792.

Release note: None

@yuzefovich yuzefovich requested review from jordanlewis, michae2 and a team April 16, 2021 21:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

I guess this pre-evaluation is similar to "constant folding" in the optimizer. I'm curious whether there are other cases in which we can know that a tree.TypedExpr evaluates to a constant so that we could pre-evaluate it.

@RaduBerinde
Copy link
Member

Normally, the optimizer folds all casts. The exception is with Nulls, for which there is no way to pass them down directly with the correct type.

@yuzefovich
Copy link
Member Author

Makes sense, thanks.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/colexec/colbuilder/execplan.go, line 2165 at r1 (raw file):

		// The projection result will be outputted to a new column which is
		// appended to the input batch.
		op, err = colexecproj.GetProjectionLConstOperator(

It looks like looks like colexecproj.GetProjectionLConstOperator calls colconv.GetDatumToPhysicalFn which panics if passed types.Unknown, so I'm not sure if it is a good idea to do this pre-evaluation for the left side. But this is theoretical, because I also couldn't figure out how to get the optimizer to emit a constant null on the left side of a binary operator 😄

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @michae2)


pkg/sql/colexec/colbuilder/execplan.go, line 2165 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

It looks like looks like colexecproj.GetProjectionLConstOperator calls colconv.GetDatumToPhysicalFn which panics if passed types.Unknown, so I'm not sure if it is a good idea to do this pre-evaluation for the left side. But this is theoretical, because I also couldn't figure out how to get the optimizer to emit a constant null on the left side of a binary operator 😄

In this case the null is well-typed after evaluating the cast expression (the whole idea of creating a cast expression like NULL:INT8 in the first place is to propagate the correct type), so we will see types.Int and not types.Unknown (the latter is only for "untyped" nulls).

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/colexec/colbuilder/execplan.go, line 2165 at r1 (raw file):
Ah, I'm confused then, because when I was looking in the debugger it did not look well-typed. (This is right, not left, because like I said I can't figure out how to get the optimizer to emit a null on the left side.)

(dlv) print right
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.TypedExpr(github.com/cockroachdb/cockroach/pkg/sql/sem/tree.dNull) {}

(dlv) call right.ResolvedType()
> github.com/cockroachdb/cockroach/pkg/sql/colexec/colbuilder.planProjectionExpr() ./pkg/sql/colexec/colbuilder/execplan.go:2145 (PC: 0x6f3ca86)
Warning: debugging optimized function
Values returned:
	~r0: *github.com/cockroachdb/cockroach/pkg/sql/types.T {
		InternalType: github.com/cockroachdb/cockroach/pkg/sql/types.InternalType {
			Family: UnknownFamily (13),
			Width: 0,
			Precision: 0,
			ArrayDimensions: []int32 len: 0, cap: 0, nil,
			Locale: *"",
			VisibleType: 0,
			ArrayElemType: *github.com/cockroachdb/cockroach/pkg/sql/types.Family nil,
			TupleContents: []*github.com/cockroachdb/cockroach/pkg/sql/types.T len: 0, cap: 0, nil,
			TupleLabels: []string len: 0, cap: 0, nil,
			Oid: T_unknown (705),
			ArrayContents: *github.com/cockroachdb/cockroach/pkg/sql/types.T nil,
			TimePrecisionIsSet: false,
			IntervalDurationField: *github.com/cockroachdb/cockroach/pkg/sql/types.IntervalDurationField nil,
			GeoMetadata: *github.com/cockroachdb/cockroach/pkg/sql/types.GeoMetadata nil,
			UDTMetadata: *github.com/cockroachdb/cockroach/pkg/sql/types.PersistentUserDefinedTypeMetadata nil,},
		TypeMeta: github.com/cockroachdb/cockroach/pkg/sql/types.UserDefinedTypeMetadata {
			Name: *github.com/cockroachdb/cockroach/pkg/sql/types.UserDefinedTypeName nil,
			Version: 0,
			EnumData: *github.com/cockroachdb/cockroach/pkg/sql/types.EnumMetadata nil,},}

  2140:		}
  2141:		right, err = preEvaluateConstCast(evalCtx, right)
  2142:		if err != nil {
  2143:			return nil, resultIdx, typs, err
  2144:		}
=>2145:		allocator := colmem.NewAllocator(ctx, acc, factory)
  2146:		resultIdx = -1
  2147:		// There are 3 cases. Either the left is constant, the right is constant,
  2148:		// or neither are constant.
  2149:		if lConstArg, lConst := left.(tree.Datum); lConst {
  2150:			// Case one: The left is constant.

But I see the TypedExpr wrapper on there... hmm. Am I reading the debugger output wrong?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @michae2)


pkg/sql/colexec/colbuilder/execplan.go, line 2165 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Ah, I'm confused then, because when I was looking in the debugger it did not look well-typed. (This is right, not left, because like I said I can't figure out how to get the optimizer to emit a null on the left side.)

(dlv) print right
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.TypedExpr(github.com/cockroachdb/cockroach/pkg/sql/sem/tree.dNull) {}

(dlv) call right.ResolvedType()
> github.com/cockroachdb/cockroach/pkg/sql/colexec/colbuilder.planProjectionExpr() ./pkg/sql/colexec/colbuilder/execplan.go:2145 (PC: 0x6f3ca86)
Warning: debugging optimized function
Values returned:
	~r0: *github.com/cockroachdb/cockroach/pkg/sql/types.T {
		InternalType: github.com/cockroachdb/cockroach/pkg/sql/types.InternalType {
			Family: UnknownFamily (13),
			Width: 0,
			Precision: 0,
			ArrayDimensions: []int32 len: 0, cap: 0, nil,
			Locale: *"",
			VisibleType: 0,
			ArrayElemType: *github.com/cockroachdb/cockroach/pkg/sql/types.Family nil,
			TupleContents: []*github.com/cockroachdb/cockroach/pkg/sql/types.T len: 0, cap: 0, nil,
			TupleLabels: []string len: 0, cap: 0, nil,
			Oid: T_unknown (705),
			ArrayContents: *github.com/cockroachdb/cockroach/pkg/sql/types.T nil,
			TimePrecisionIsSet: false,
			IntervalDurationField: *github.com/cockroachdb/cockroach/pkg/sql/types.IntervalDurationField nil,
			GeoMetadata: *github.com/cockroachdb/cockroach/pkg/sql/types.GeoMetadata nil,
			UDTMetadata: *github.com/cockroachdb/cockroach/pkg/sql/types.PersistentUserDefinedTypeMetadata nil,},
		TypeMeta: github.com/cockroachdb/cockroach/pkg/sql/types.UserDefinedTypeMetadata {
			Name: *github.com/cockroachdb/cockroach/pkg/sql/types.UserDefinedTypeName nil,
			Version: 0,
			EnumData: *github.com/cockroachdb/cockroach/pkg/sql/types.EnumMetadata nil,},}

  2140:		}
  2141:		right, err = preEvaluateConstCast(evalCtx, right)
  2142:		if err != nil {
  2143:			return nil, resultIdx, typs, err
  2144:		}
=>2145:		allocator := colmem.NewAllocator(ctx, acc, factory)
  2146:		resultIdx = -1
  2147:		// There are 3 cases. Either the left is constant, the right is constant,
  2148:		// or neither are constant.
  2149:		if lConstArg, lConst := left.(tree.Datum); lConst {
  2150:			// Case one: The left is constant.

But I see the TypedExpr wrapper on there... hmm. Am I reading the debugger output wrong?

No, you're totally right, nice catch:

if d == DNull {
return d, nil
}

I'm not yet sure whether this PR needs an adjustment, but I'll need to think a bit more about it.

We have an optimized operator for `Is{Not}DistinctFrom` operation which
we can plan currently only if the right side is a constant NULL. In some
cases the optimizer might create a cast expression on the right in order
to propagate the type of the null, and previously we would fallback to
the default comparison operator in such scenario. This is suboptimal,
and this commit fixes the issue by special casing the scenario of
casting NULL to some type.

Release note: None
@yuzefovich yuzefovich changed the title colbuilder: pre-evaluate casts of constants in projection exprs colbuilder: optimize IS DISTINCT FROM NULL when nulls is casted Apr 20, 2021
@yuzefovich yuzefovich changed the title colbuilder: optimize IS DISTINCT FROM NULL when nulls is casted colbuilder: optimize IS DISTINCT FROM NULL when null is casted Apr 20, 2021
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @michae2)


pkg/sql/colexec/colbuilder/execplan.go, line 2165 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

No, you're totally right, nice catch:

if d == DNull {
return d, nil
}

I'm not yet sure whether this PR needs an adjustment, but I'll need to think a bit more about it.

I decided to lean on the safer side and introduce some special case behavior only for Is{Not}DistinctFrom operations.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Apr 20, 2021
63238: roachtest: update libpq blocklist to ignore TestCopyInBinaryError r=rafiss a=RichardJCai

roachtest: update libpq blocklist to ignore TestCopyInBinaryError

TestCopyInBinary's behaviour was incorrect in the test since we were not receiving an expected error (`pq: only text format supported for COPY`). 
Furthermore the test would sporadically panic causing the following tests to fail.

Release note: None

Resolves #57855 

63244: logictest: compare floating point values approximately on s390x r=ajwerner a=jonathan-albrecht-ibm

### Overview
On s390x in the std math package and some c-deps, floating point calculations can produce results that differ from the values calculated on amd64. This patch adds a function to compare logictest floating point and decimal values within a small relative margin on s390x. The existing behavior on all other platforms remains the same.

On s390x, there are three main reasons that floating point calculations sometimes give different results:
* the go compiler generates the s390x "fused multiply and add" (FMA) instruction where possible,
* the go math package uses s390x optimized versions of some functions,
* some c libs eg. libgeos, libproj also have platform specific floating point calculation differences.

### Proposal
The motivation for this work is so that users building CRDB on s390x do not need to diagnose tests that fail because of platform dependent floating point differences.

This PR proposes one possible approach to dealing with platform dependent floating point differences. Since development, testing and CI are done on amd64 it keeps the current logic for determining float equality exactly the same. On s390x, it determines values of decimal and float column types (R and F) in query tests to be equal if they are within a tolerance. See the new pkg/testutils/floatcmp package for the implementation of the approximate equality logic and changes in logictest.go to see how it is applied to only s390x.

There are probably other approaches I haven't thought of that would also work. I'd like to use this proposal to start a conversation on how all tests in CRDB that currently fail due to expected floating point differences could eventually be made to pass.

Of course platforms other than s390x may also have differences but I haven't looked at any other platforms. The changes should be easily extendable to other platforms if needed.

### Future Work
The changes in this PR allow the following tests to pass on s390x:
* TestLogic/fakedist-disk/builtin_function/extra_float_digits_3
*  TestLogic/fakedist-metadata/builtin_function/extra_float_digits_3
*  TestLogic/fakedist-vec-off/builtin_function/extra_float_digits_3
*  TestLogic/fakedist/builtin_function/extra_float_digits_3
*  TestLogic/local-spec-planning/builtin_function/extra_float_digits_3
*  TestLogic/local-vec-off/builtin_function/extra_float_digits_3
*  TestLogic/local/builtin_function/extra_float_digits_3

There are about 70 more tests that currently fail due to platform floating point differences on s390x, many are tests of geospatial functions. Assuming we can come up with a good approach, I'd like to continue working on fixes to be submitted in future PRs.

Release note: None

63802: colbuilder: optimize IS DISTINCT FROM NULL when null is casted r=yuzefovich a=yuzefovich

We have an optimized operator for `Is{Not}DistinctFrom` operation which
we can plan currently only if the right side is a constant NULL. In some
cases the optimizer might create a cast expression on the right in order
to propagate the type of the null, and previously we would fallback to
the default comparison operator in such scenario. This is suboptimal,
and this commit fixes the issue by special casing the scenario of
casting NULL to some type.

Fixes: #63792.

Release note: None

63903: sql: mark planNodeToRowSource as streaming intelligently r=yuzefovich a=yuzefovich

Previously, out of abundance of caution (and some laziness) we marked
all `planNodeToRowSource` processors as of "streaming" nature. This
marker influences whether we wrap it with a streaming or buffering
columnarizer into the vectorized flow. However, doing so is unnecessary
in most cases and kills some of the benefits of the vectorized model.
The only special planNode is `hookFnNode` which must be streaming, all
others are safe to have buffering around them. This commit implements
that idea. This required adding another method to `Processor` interface.

Release note: None

Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Jonathan Albrecht <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 20, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Apr 20, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Apr 21, 2021

Build succeeded:

@craig craig bot merged commit b32bbb5 into cockroachdb:master Apr 21, 2021
@yuzefovich yuzefovich deleted the vec-const-cast branch April 21, 2021 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

colexec: IsDistinctFrom optimization is not used in "self-equality" case
4 participants