-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: tree: apply functions to TEXT expressions compared with the @@ operator #99748
Conversation
568b6cc
to
780a942
Compare
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @msirek)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @blathers-crl[bot], @jordanlewis, and @msirek)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
// a::TSQUERY @@ b::TEXT | a @@ to_tsvector(b) // a::TSVECTOR @@ b::TEXT | a @@ b::TSQUERY // a::TEXT @@ b::TSVECTOR | a::TSQUERY @@ b
I don't see these last two rules in the linked docs. Where did they come from?
It looks like Postgres doesn't implicitly add these casts:
marcus=# SELECT to_tsvector('fat cats ate fat rats') @@ 'fat & rat'::TEXT;
ERROR: 42883: operator does not exist: tsvector @@ text
LINE 1: SELECT to_tsvector('fat cats ate fat rats') @@ 'fat & rat'::...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE
marcus=# SELECT to_tsvector('cat rat') @@ t FROM t;
ERROR: 42883: operator does not exist: tsvector @@ text
LINE 1: SELECT to_tsvector('cat rat') @@ t FROM t;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
marcus=# SELECT 'cat | rat'::TEXT @@ to_tsvector('cat rat');
ERROR: 42883: operator does not exist: text @@ tsvector
LINE 1: SELECT 'cat | rat'::TEXT @@ to_tsvector('cat rat');
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE
marcus=# SELECT t @@ to_tsvector('cat rat') FROM t;
ERROR: 42883: operator does not exist: text @@ tsvector
LINE 1: SELECT t @@ to_tsvector('cat rat') FROM t;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
pkg/sql/logictest/testdata/logic_test/tsvector
line 405 at r1 (raw file):
false statement error pq: syntax error in TSQuery: fat cats chased fat, out of shape rats
In Postgres, the error is that the text @@ tsvector
operator does not exist.
pkg/sql/logictest/testdata/logic_test/tsvector
line 408 at r1 (raw file):
SELECT b @@ a::tsvector FROM ab statement error pq: syntax error in TSQuery: fat cats chased fat, out of shape rats
In Postgres, the error is that the tsvector @@ text
operator does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
I don't see these last two rules in the linked docs. Where did they come from?
From observing the behavior of Postgres for uncasted/untyped strings.
It looks like Postgres doesn't implicitly add these casts: ...
CRDB and Postgres type quoted strings differently:
CRDB:
root@localhost:26257/defaultdb> SELECT pg_typeof('fat & rat');
pg_typeof
-------------
text
Postgres:
SELECT pg_typeof('fat & rat');
pg_typeof
-----------
unknown
So, in order to achieve similar behavior to Postgres for an expression like:
to_tsvector('fat cats ate fat rats') @@ 'fat & rat'
The implicit type conversion from TEXT to TSQuery is done.
To fully match Postgres, we'd have to type quoted strings as an unknown type.
Since the @@
operator doesn't really have a meaning for comparing TEXT without data type conversion, we only really have one choice in the direction and target data type of the conversion as opposed to other operator types like =
, so it is possible to make a clear choice.
It would also be an option to disallow this implicit cast and force the user to explicitly cast string literals, at the cost of erroring out
an expression like to_tsvector('fat cats ate fat rats') @@ 'fat & rat'
where Postgres does not.
I don't really have a strong opinion on this. Thoughts?
Maybe @jordanlewis has an opinion.
pkg/sql/logictest/testdata/logic_test/tsvector
line 405 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
In Postgres, the error is that the
text @@ tsvector
operator does not exist.
See above discussion
pkg/sql/logictest/testdata/logic_test/tsvector
line 408 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
In Postgres, the error is that the
tsvector @@ text
operator does not exist.
See above discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
We should be parsing a string literal (e.g., 'cat'
) as a tree.StrVal
with an unknown type - which is what Postgres does:
If a type is not specified for a string literal, then the placeholder type unknown is assigned initially...
I think we forgot to add type conversion logic to StrVal.ResolveAsType
which would fix the problems you describe without adding casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
We should be parsing a string literal (e.g.,
'cat'
) as atree.StrVal
with an unknown type - which is what Postgres does:If a type is not specified for a string literal, then the placeholder type unknown is assigned initially...
I think we forgot to add type conversion logic to
StrVal.ResolveAsType
which would fix the problems you describe without adding casts.
Thanks. Maybe you could say a few more words on this. Our unknown type seems to be reserved for NULL, so I'm not sure about assigning that type to a non-null string literal:
cockroach/pkg/sql/types/types.go
Lines 258 to 262 in 1a4b094
// Unknown is the type of an expression that statically evaluates to NULL. | |
// This type should never be returned for an expression that does not *always* | |
// evaluate to NULL. | |
Unknown = &T{InternalType: InternalType{ | |
Family: UnknownFamily, Oid: oid.T_unknown, Locale: &emptyLocale}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Thanks. Maybe you could say a few more words on this. Our unknown type seems to be reserved for NULL, so I'm not sure about assigning that type to a non-null string literal:
cockroach/pkg/sql/types/types.go
Lines 258 to 262 in 1a4b094
// Unknown is the type of an expression that statically evaluates to NULL. // This type should never be returned for an expression that does not *always* // evaluate to NULL. Unknown = &T{InternalType: InternalType{ Family: UnknownFamily, Oid: oid.T_unknown, Locale: &emptyLocale}}
A string literal is parsed as a tree.StrVal
and initially has no type (or the "unknown" type). The type assigned to a string literal during type checking is dependent on it's context. For example, in the expression 'true' OR false
, the 'true'
string literal never has the type TEXT
(or any other string-like type). During type checking, the literal value is parsed directly into the type desired by the context - a boolean in this case. It's not casted from a TEXT to a BOOL.
The difference between a string literal being casted to a type and being parsed as a type are subtle and mostly esoteric, but it explains why this succeeds in Postgres:
SELECT 'true' OR false;
And why this fails with the error argument of OR must be type boolean, not type text
:
SELECT 'true'::TEXT OR false;
It also explains why this succeeds:
SELECT 'cat | rat' @@ to_tsvector('a fat cat');
And why this fails with the error operator does not exist: text @@ tsvector
:
SELECT 'cat | rat'::TEXT @@ to_tsvector('a fat cat');
In the first case, 'cat | rat'
is never a TEXT
- it's parsed directly as a TSQUERY. In the second case, the ::TEXT
forces the LHS of the @@
to be a TEXT
, which @@
does not support.
After looking into StrVal.ResolveAsType
a bit more, I believe it already supports parsing a string literal directly into a TSVECTOR or TSQUERY. So I believe you can remove the casts here and the behavior will match Postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
A string literal is parsed as a
tree.StrVal
and initially has no type (or the "unknown" type). The type assigned to a string literal during type checking is dependent on it's context. For example, in the expression'true' OR false
, the'true'
string literal never has the typeTEXT
(or any other string-like type). During type checking, the literal value is parsed directly into the type desired by the context - a boolean in this case. It's not casted from a TEXT to a BOOL.The difference between a string literal being casted to a type and being parsed as a type are subtle and mostly esoteric, but it explains why this succeeds in Postgres:
SELECT 'true' OR false;And why this fails with the error
argument of OR must be type boolean, not type text
:SELECT 'true'::TEXT OR false;It also explains why this succeeds:
SELECT 'cat | rat' @@ to_tsvector('a fat cat');And why this fails with the error
operator does not exist: text @@ tsvector
:SELECT 'cat | rat'::TEXT @@ to_tsvector('a fat cat');In the first case,
'cat | rat'
is never aTEXT
- it's parsed directly as a TSQUERY. In the second case, the::TEXT
forces the LHS of the@@
to be aTEXT
, which@@
does not support.After looking into
StrVal.ResolveAsType
a bit more, I believe it already supports parsing a string literal directly into a TSVECTOR or TSQUERY. So I believe you can remove the casts here and the behavior will match Postgres.
Thank you. Without the casts in the code, something like this fails with "unsupported comparison operator: @@ :
SELECT 'fat cats chased fat, out of shape rats' @@ 'fat rats'::tsvector
Also, even if the constants were handled correctly some other place in the code, non-constant expressions couldn't be supported without the cast (b
has TEXT type):
SELECT b @@ a::tsvector FROM ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
Ahh I see. That particular example fails because it's not valid TSQuery syntax, and we incorrectly propagate the error, so it turns into an "unsupported comparison operator" error. I'd put that under the umbrella of #75101 and TODO that - I think a confusing error is less bad than inconsistent behavior.
A valid TSQuery does work without the casts, like SELECT 'fat' @@ 'fat rats'::tsvector
. See my draft PR here: #100704. I can un-draft it and we can merge it if we'd like to.
Also, even if the constants were handled correctly some other place in the code, non-constant expressions couldn't be supported without the cast (b has TEXT type):
SELECT b @@ a::tsvector FROM ab
In Postgres this errors, so this is the correct behavior:
marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE
marcus=# SELECT t @@ 'fat rats'::TSVECTOR FROM t;
ERROR: 42883: operator does not exist: text @@ tsvector
LINE 1: SELECT t @@ 'fat rats'::TSVECTOR FROM t;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
LOCATION: op_error, parse_oper.c:656
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Ahh I see. That particular example fails because it's not valid TSQuery syntax, and we incorrectly propagate the error, so it turns into an "unsupported comparison operator" error. I'd put that under the umbrella of #75101 and TODO that - I think a confusing error is less bad than inconsistent behavior.
A valid TSQuery does work without the casts, like
SELECT 'fat' @@ 'fat rats'::tsvector
. See my draft PR here: #100704. I can un-draft it and we can merge it if we'd like to.Also, even if the constants were handled correctly some other place in the code, non-constant expressions couldn't be supported without the cast (b has TEXT type):
SELECT b @@ a::tsvector FROM ab
In Postgres this errors, so this is the correct behavior:
marcus=# CREATE TABLE t (t TEXT); CREATE TABLE marcus=# SELECT t @@ 'fat rats'::TSVECTOR FROM t; ERROR: 42883: operator does not exist: text @@ tsvector LINE 1: SELECT t @@ 'fat rats'::TSVECTOR FROM t; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. LOCATION: op_error, parse_oper.c:656
Oh, right, I think I added the CAST mainly for the non-constant expression case. Their docs indicate TEXT @@ TSVECTOR is a valid comparison, so it's a little confusing that the Postgres docs don't fully match the implementation. I'd almost like to say it's either a docs bug or an actual bug in Postgres. So the question is, do we want to match Postgres docs, or Postgres implementation.
Also, I thought the implicit cast would be useful since it's the only valid way to cast that comparison operation, though maybe it's not a common query pattern. Anyway, I suppose it's always OK to go by default with Postgres compatibility in terms of their actual implementation. I don't know, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/sem/tree/type_check.go
line 2342 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Oh, right, I think I added the CAST mainly for the non-constant expression case. Their docs indicate TEXT @@ TSVECTOR is a valid comparison, so it's a little confusing that the Postgres docs don't fully match the implementation. I'd almost like to say it's either a docs bug or an actual bug in Postgres. So the question is, do we want to match Postgres docs, or Postgres implementation.
Also, I thought the implicit cast would be useful since it's the only valid way to cast that comparison operation, though maybe it's not a common query pattern. Anyway, I suppose it's always OK to go by default with Postgres compatibility in terms of their actual implementation. I don't know, what do you think?
Never mind, there is actually no rule in the Postgres docs about text @@ tsvector. I'll remove the CAST.
780a942
to
febcf89
Compare
The TSQuery and TSVector "matches" operator "@@" returns different results on CRDB vs. Postgres when one of the arguments is a TEXT expression. The rules at https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-MATCHING specify: > The form text @@ tsquery is equivalent to to_tsvector(x) @@ y. > The form text @@ text is equivalent to to_tsvector(x) @@ plainto_tsquery(y). This PR adds these implicit function calls in these "matches" comparison expressions during type checking. Fixes #98804 Release note (bug fix): This fixes incorrect results from the text search @@ ("matches") operator when one of the arguments is a TEXT expression and the other argument is a TEXT or TSQuery expression.
febcf89
to
9d33048
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whichever you prefer. The current backport already removes the cast, but if we don't want to mix changes, just merge #100704 and then both that and #99583 can be backported together.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @jordanlewis, and @mgartner)
Ok, I've bors'd #100704. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing this PR and replacing with #100918
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @jordanlewis, and @mgartner)
Backport 1/1 commits from #99583 on behalf of @msirek.
/cc @cockroachdb/release
The TSQuery and TSVector "matches" operator "@@" returns different results
on CRDB vs. Postgres when one of the arguments is a TEXT expression.
The rules at
https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-MATCHING
specify:
This PR adds these implicit function calls in these "matches" comparison
expressions during type checking.
Fixes #98804
Release note (bug fix): This fixes incorrect results from the text search
@@ ("matches") operator when one of the arguments is a TEXT expression and the
other argument is a TEXT or TSQuery expression.
Release justification: Fixes incorrect results from the new @@ ("matches") operator