Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.1: physicalplan: preevaluate subqueries on LocalExprs and always set LocalExprs #56267

Closed
wants to merge 2 commits into from

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Nov 3, 2020

Backport 2/3 commits from #49891.

/cc @cockroachdb/release

I am backporting because it also fixes an error caused when performing comparison between collated strings.


physicalplan: preevaluate subqueries on LocalExprs

When the plan is local, we do not serialize expressions. Previously, in
such a case we would also not preevaluate the subqueries in the
expressions which made EXPLAIN (VEC) return unexpected plan (there
would tree.Subquery in the expression which we don't support in the
vectorized, so we would wrap the plan). Now we will preevalute the
subqueries before storing in the processor spec. AFAICT it affects only
this explain variant and nothing else.

Release note: None

physicalplan: always store LocalExpr

Previously, we would set either LocalExpr (unserialized expression,
only when we have the full plan on a single node) or Expr (serialized
expression, when we have distributed plan as well as in some tests).
However, we could be setting both and making best effort to reuse
unserialized LocalExpr on the gateway even if the plan is distributed.
And this commit adds such behavior.

Fixes: #49810.

Release note: None

When the plan is local, we do not serialize expressions. Previously, in
such a case we would also not preevaluate the subqueries in the
expressions which made `EXPLAIN (VEC)` return unexpected plan (there
would `tree.Subquery` in the expression which we don't support in the
vectorized, so we would wrap the plan). Now we will preevalute the
subqueries before storing in the processor spec. AFAICT it affects only
this explain variant and nothing else.

Release note: None
Previously, we would set either `LocalExpr` (unserialized expression,
only we have the full plan on a single node) or `Expr` (serialized
expression, when we have distributed plan as well as in some tests).
However, we could be setting both and making best effort to reuse
unserialized `LocalExpr` on the gateway even if the plan is distributed.
And this commit adds such behavior.

Release note: None
@rafiss rafiss requested a review from yuzefovich November 3, 2020 22:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

We should also add a regression test with the repro (possibly as a separate PR against master that is backported both to 20.2 and 20.1).

Reviewed 1 of 1 files at r1, 6 of 6 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 4, 2020

@yuzefovich i agree, i will make a regression test.

for this backport, do you know if this change is expected? a query from TestExecBuild/local/subquery changed from vectorized=false to true.

and do you know more about this failure?

=== CONT  TestLogic/local/sqlsmith
panic: ReturnType called on TypedExpr with empty typeAnnotation. Was the underlying Expr type-checked before asserting a type of TypedExpr? [recovered]
	panic: panic while executing 1 statements: UPDATE _ SET _ = (SELECT _) WHERE _ = _; caused by ReturnType called on TypedExpr with empty typeAnnotation. Was the underlying Expr type-checked before asserting a type of TypedExpr?

@yuzefovich
Copy link
Member

The change on subquery test is explained by the first commit and is expected (the test was probably updated on master at the time before the PR went in), but I don't know what's up with the type-checking panic.

I also now think that we need to backport the second commit from the original PR - without it, it is possible for us to get into a situation when SupportsVectorized check succeeds, but then on the remote node we cannot setup the flow which will cause the query fail (unlike the expected behavior in such scenario when SupportsVectorized check fails). It'll probably be not a clean backport. Let me know if you want me to take over the backport.

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 4, 2020

thanks @yuzefovich i tried the backport but i don't feel confident that i can do it correctly. can you please take it over?

i'll still write the regression test

@yuzefovich
Copy link
Member

Sounds good, I'll work on the backport and debug the issue to see why the PR fixed it.

@yuzefovich yuzefovich closed this Nov 4, 2020
@yuzefovich
Copy link
Member

Hm, actually backporting the whole PR doesn't fix the issue - I can still repro the problem with all 3 commits cherry-picked.

I've been debugging the issue for a bit, and the root cause of the issue is that we're using lex.EncodeLocaleName in order to serialize the collation locale, and that method changes dash characters into underscore characters. Another issue is that for some reason during parsing the serialized expression all characters in the collation locale string are also switched to the lower case. I've prototyped this diff that does solve the problem:

diff --git a/pkg/sql/logictest/testdata/logic_test/collatedstring b/pkg/sql/logictest/testdata/logic_test/collatedstring
index 07ce1b45bf..fd708662f2 100644
--- a/pkg/sql/logictest/testdata/logic_test/collatedstring
+++ b/pkg/sql/logictest/testdata/logic_test/collatedstring
@@ -368,3 +368,10 @@ CREATE TABLE t46570(c0 BOOL, c1 STRING COLLATE en);
 CREATE INDEX ON t46570(rowid, c1 DESC);
 INSERT INTO t46570(c1, rowid) VALUES('' COLLATE en, 0);
 UPSERT INTO t46570(rowid) VALUES (0), (1)
+
+# Regression for #56257 (mismatch in collation locales due to serialization
+# changes).
+statement ok
+CREATE TABLE nocase_strings (greeting STRING COLLATE "en-US-u-ks-level2");
+INSERT INTO nocase_strings VALUES ('Hello, friend.' COLLATE "en-US-u-ks-level2"), ('Hi. My name is Petee.' COLLATE "en-US-u-ks-level2");
+SELECT * FROM nocase_strings WHERE greeting = ('hi. my name is petee.' COLLATE "en-US-u-ks-level2")
diff --git a/pkg/sql/sem/tree/datum.go b/pkg/sql/sem/tree/datum.go
index 5d7d069f46..f6bd3a5443 100644
--- a/pkg/sql/sem/tree/datum.go
+++ b/pkg/sql/sem/tree/datum.go
@@ -1261,7 +1261,7 @@ func (d *DCollatedString) Compare(ctx *EvalContext, other Datum) int {
                return 1
        }
        v, ok := UnwrapDatum(ctx, other).(*DCollatedString)
-       if !ok || d.Locale != v.Locale {
+       if !ok || !d.ResolvedType().Equivalent(other.ResolvedType()) {
                panic(makeUnsupportedComparisonMessage(d, other))
        }
        return bytes.Compare(d.Key, v.Key)
diff --git a/pkg/sql/types/types.go b/pkg/sql/types/types.go
index 9fdc60fe1c..e8f3aeac4b 100644
--- a/pkg/sql/types/types.go
+++ b/pkg/sql/types/types.go
@@ -1423,8 +1423,13 @@ func (t *T) Equivalent(other *T) bool {
 
        switch t.Family() {
        case CollatedStringFamily:
-               if t.Locale() != "" && other.Locale() != "" && t.Locale() != other.Locale() {
-                       return false
+               if t.Locale() != "" && other.Locale() != "" {
+                       var tbuf, otherbuf bytes.Buffer
+                       lex.EncodeLocaleName(&tbuf, t.Locale())
+                       lex.EncodeLocaleName(&otherbuf, other.Locale())
+                       if strings.ToLower(tbuf.String()) != strings.ToLower(otherbuf.String()) {
+                               return false
+                       }
                }
 
        case TupleFamily:

Since these issues fall under the SQL Experience team, @rafiss I'd like for you to take over fixing the problem.

I'm guessing that there were other changes in 20.2 branch that together with #49891 happen to go around the problem, and I have a feeling that it actually might be still lurking on the master branch.

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 5, 2020

Thanks for your detailed investigation! I can own the fix again. Sorry for sending you down a rabbit hole with this backport. I hope I didn't mess up the git bisect...

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 5, 2020

Though to clarify, the error unsupported comparison operator: <collatedstring{en-US-u-ks-level2}> = <collatedstring{en-us-u-ks-level2}> is not as much of a problem as the panic.

@yuzefovich
Copy link
Member

Does the issue actually cause a panic in some cases? My understanding is that it'll always be an internal error with the stacktrace printed out (and a sentry report on release binaries), but the node doesn't actually crash.

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 5, 2020

Yeah it can cause a panic, at least according to the support ticket. So far I have only seen an internal error actually. I will create a public issue to track this and document further progress/reproduction steps there. #56335

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 5, 2020

Hm, actually backporting the whole PR doesn't fix the issue - I can still repro the problem with all 3 commits cherry-picked.

I tested this backport locally and it does fix the issue.

@rafiss
Copy link
Collaborator Author

rafiss commented Nov 5, 2020

Ah I see, it fails with fakedist, but works with local. That makes sense, based on what you were saying about parsing the serialized expression turning the locale into lower case.

@rafiss rafiss deleted the backport20.1-49891 branch November 6, 2020 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants