Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: tree: apply functions to TEXT expressions compared with the @@ operator #99748

Closed

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Mar 28, 2023

Backport 1/1 commits from #99583 on behalf of @msirek.

/cc @cockroachdb/release


The TSQuery and TSVector "matches" operator "@@" returns different results
on CRDB vs. Postgres when one of the arguments is a TEXT expression.
The rules at
https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-MATCHING
specify:

The form text @@ tsquery is equivalent to to_tsvector(x) @@ y.
The form text @@ text is equivalent to to_tsvector(x) @@ plainto_tsquery(y).

This PR adds these implicit function calls in these "matches" comparison
expressions during type checking.

Fixes #98804

Release note (bug fix): This fixes incorrect results from the text search
@@ ("matches") operator when one of the arguments is a TEXT expression and the
other argument is a TEXT or TSQuery expression.


Release justification: Fixes incorrect results from the new @@ ("matches") operator

@blathers-crl blathers-crl bot requested a review from a team as a code owner March 28, 2023 01:06
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-99583 branch 2 times, most recently from 568b6cc to 780a942 Compare March 28, 2023 01:06
@blathers-crl
Copy link
Author

blathers-crl bot commented Mar 28, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Mar 28, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @msirek)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @blathers-crl[bot], @jordanlewis, and @msirek)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

		// a::TSQUERY @@ b::TEXT          |  a @@ to_tsvector(b)
		// a::TSVECTOR @@ b::TEXT         |  a @@ b::TSQUERY
		// a::TEXT @@ b::TSVECTOR         |  a::TSQUERY @@ b

I don't see these last two rules in the linked docs. Where did they come from?

It looks like Postgres doesn't implicitly add these casts:

marcus=# SELECT to_tsvector('fat cats ate fat rats') @@ 'fat & rat'::TEXT;
ERROR:  42883: operator does not exist: tsvector @@ text
LINE 1: SELECT to_tsvector('fat cats ate fat rats') @@ 'fat & rat'::...
                                                    ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.


marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE

marcus=# SELECT to_tsvector('cat rat') @@ t FROM t;
ERROR:  42883: operator does not exist: tsvector @@ text
LINE 1: SELECT to_tsvector('cat rat') @@ t FROM t;
                                      ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
marcus=# SELECT 'cat | rat'::TEXT @@ to_tsvector('cat rat');
ERROR:  42883: operator does not exist: text @@ tsvector
LINE 1: SELECT 'cat | rat'::TEXT @@ to_tsvector('cat rat');
                                 ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.


marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE

marcus=# SELECT t @@ to_tsvector('cat rat') FROM t;
ERROR:  42883: operator does not exist: text @@ tsvector
LINE 1: SELECT t @@ to_tsvector('cat rat') FROM t;
                 ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

pkg/sql/logictest/testdata/logic_test/tsvector line 405 at r1 (raw file):

false

statement error pq: syntax error in TSQuery: fat cats chased fat, out of shape rats

In Postgres, the error is that the text @@ tsvector operator does not exist.


pkg/sql/logictest/testdata/logic_test/tsvector line 408 at r1 (raw file):

SELECT b @@ a::tsvector FROM ab

statement error pq: syntax error in TSQuery: fat cats chased fat, out of shape rats

In Postgres, the error is that the tsvector @@ text operator does not exist.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

I don't see these last two rules in the linked docs. Where did they come from?

From observing the behavior of Postgres for uncasted/untyped strings.

It looks like Postgres doesn't implicitly add these casts: ...

CRDB and Postgres type quoted strings differently:
CRDB:

root@localhost:26257/defaultdb> SELECT pg_typeof('fat & rat');                                                                                                                                    
  pg_typeof
-------------
  text

Postgres:

SELECT pg_typeof('fat & rat');
 pg_typeof 
-----------
 unknown

So, in order to achieve similar behavior to Postgres for an expression like:
to_tsvector('fat cats ate fat rats') @@ 'fat & rat'
The implicit type conversion from TEXT to TSQuery is done.
To fully match Postgres, we'd have to type quoted strings as an unknown type.
Since the @@ operator doesn't really have a meaning for comparing TEXT without data type conversion, we only really have one choice in the direction and target data type of the conversion as opposed to other operator types like =, so it is possible to make a clear choice.

It would also be an option to disallow this implicit cast and force the user to explicitly cast string literals, at the cost of erroring out
an expression like to_tsvector('fat cats ate fat rats') @@ 'fat & rat' where Postgres does not.

I don't really have a strong opinion on this. Thoughts?

Maybe @jordanlewis has an opinion.


pkg/sql/logictest/testdata/logic_test/tsvector line 405 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

In Postgres, the error is that the text @@ tsvector operator does not exist.

See above discussion


pkg/sql/logictest/testdata/logic_test/tsvector line 408 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

In Postgres, the error is that the tsvector @@ text operator does not exist.

See above discussion

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):
We should be parsing a string literal (e.g., 'cat') as a tree.StrVal with an unknown type - which is what Postgres does:

If a type is not specified for a string literal, then the placeholder type unknown is assigned initially...

I think we forgot to add type conversion logic to StrVal.ResolveAsType which would fix the problems you describe without adding casts.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We should be parsing a string literal (e.g., 'cat') as a tree.StrVal with an unknown type - which is what Postgres does:

If a type is not specified for a string literal, then the placeholder type unknown is assigned initially...

I think we forgot to add type conversion logic to StrVal.ResolveAsType which would fix the problems you describe without adding casts.

Thanks. Maybe you could say a few more words on this. Our unknown type seems to be reserved for NULL, so I'm not sure about assigning that type to a non-null string literal:

// Unknown is the type of an expression that statically evaluates to NULL.
// This type should never be returned for an expression that does not *always*
// evaluate to NULL.
Unknown = &T{InternalType: InternalType{
Family: UnknownFamily, Oid: oid.T_unknown, Locale: &emptyLocale}}

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Thanks. Maybe you could say a few more words on this. Our unknown type seems to be reserved for NULL, so I'm not sure about assigning that type to a non-null string literal:

// Unknown is the type of an expression that statically evaluates to NULL.
// This type should never be returned for an expression that does not *always*
// evaluate to NULL.
Unknown = &T{InternalType: InternalType{
Family: UnknownFamily, Oid: oid.T_unknown, Locale: &emptyLocale}}

A string literal is parsed as a tree.StrVal and initially has no type (or the "unknown" type). The type assigned to a string literal during type checking is dependent on it's context. For example, in the expression 'true' OR false, the 'true' string literal never has the type TEXT (or any other string-like type). During type checking, the literal value is parsed directly into the type desired by the context - a boolean in this case. It's not casted from a TEXT to a BOOL.

The difference between a string literal being casted to a type and being parsed as a type are subtle and mostly esoteric, but it explains why this succeeds in Postgres:

SELECT 'true' OR false;

And why this fails with the error argument of OR must be type boolean, not type text:

SELECT 'true'::TEXT OR false;

It also explains why this succeeds:

SELECT 'cat | rat' @@ to_tsvector('a fat cat');

And why this fails with the error operator does not exist: text @@ tsvector:

SELECT 'cat | rat'::TEXT @@ to_tsvector('a fat cat');

In the first case, 'cat | rat' is never a TEXT - it's parsed directly as a TSQUERY. In the second case, the ::TEXT forces the LHS of the @@ to be a TEXT, which @@ does not support.

After looking into StrVal.ResolveAsType a bit more, I believe it already supports parsing a string literal directly into a TSVECTOR or TSQUERY. So I believe you can remove the casts here and the behavior will match Postgres.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

A string literal is parsed as a tree.StrVal and initially has no type (or the "unknown" type). The type assigned to a string literal during type checking is dependent on it's context. For example, in the expression 'true' OR false, the 'true' string literal never has the type TEXT (or any other string-like type). During type checking, the literal value is parsed directly into the type desired by the context - a boolean in this case. It's not casted from a TEXT to a BOOL.

The difference between a string literal being casted to a type and being parsed as a type are subtle and mostly esoteric, but it explains why this succeeds in Postgres:

SELECT 'true' OR false;

And why this fails with the error argument of OR must be type boolean, not type text:

SELECT 'true'::TEXT OR false;

It also explains why this succeeds:

SELECT 'cat | rat' @@ to_tsvector('a fat cat');

And why this fails with the error operator does not exist: text @@ tsvector:

SELECT 'cat | rat'::TEXT @@ to_tsvector('a fat cat');

In the first case, 'cat | rat' is never a TEXT - it's parsed directly as a TSQUERY. In the second case, the ::TEXT forces the LHS of the @@ to be a TEXT, which @@ does not support.

After looking into StrVal.ResolveAsType a bit more, I believe it already supports parsing a string literal directly into a TSVECTOR or TSQUERY. So I believe you can remove the casts here and the behavior will match Postgres.

Thank you. Without the casts in the code, something like this fails with "unsupported comparison operator: @@ :

SELECT 'fat cats chased fat, out of shape rats' @@ 'fat rats'::tsvector

Also, even if the constants were handled correctly some other place in the code, non-constant expressions couldn't be supported without the cast (b has TEXT type):

SELECT b @@ a::tsvector FROM ab

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):
Ahh I see. That particular example fails because it's not valid TSQuery syntax, and we incorrectly propagate the error, so it turns into an "unsupported comparison operator" error. I'd put that under the umbrella of #75101 and TODO that - I think a confusing error is less bad than inconsistent behavior.

A valid TSQuery does work without the casts, like SELECT 'fat' @@ 'fat rats'::tsvector. See my draft PR here: #100704. I can un-draft it and we can merge it if we'd like to.

Also, even if the constants were handled correctly some other place in the code, non-constant expressions couldn't be supported without the cast (b has TEXT type):
SELECT b @@ a::tsvector FROM ab

In Postgres this errors, so this is the correct behavior:

marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE

marcus=# SELECT t @@ 'fat rats'::TSVECTOR FROM t;
ERROR:  42883: operator does not exist: text @@ tsvector
LINE 1: SELECT t @@ 'fat rats'::TSVECTOR FROM t;
                 ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
LOCATION:  op_error, parse_oper.c:656

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Ahh I see. That particular example fails because it's not valid TSQuery syntax, and we incorrectly propagate the error, so it turns into an "unsupported comparison operator" error. I'd put that under the umbrella of #75101 and TODO that - I think a confusing error is less bad than inconsistent behavior.

A valid TSQuery does work without the casts, like SELECT 'fat' @@ 'fat rats'::tsvector. See my draft PR here: #100704. I can un-draft it and we can merge it if we'd like to.

Also, even if the constants were handled correctly some other place in the code, non-constant expressions couldn't be supported without the cast (b has TEXT type):
SELECT b @@ a::tsvector FROM ab

In Postgres this errors, so this is the correct behavior:

marcus=# CREATE TABLE t (t TEXT);
CREATE TABLE

marcus=# SELECT t @@ 'fat rats'::TSVECTOR FROM t;
ERROR:  42883: operator does not exist: text @@ tsvector
LINE 1: SELECT t @@ 'fat rats'::TSVECTOR FROM t;
                 ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
LOCATION:  op_error, parse_oper.c:656

Oh, right, I think I added the CAST mainly for the non-constant expression case. Their docs indicate TEXT @@ TSVECTOR is a valid comparison, so it's a little confusing that the Postgres docs don't fully match the implementation. I'd almost like to say it's either a docs bug or an actual bug in Postgres. So the question is, do we want to match Postgres docs, or Postgres implementation.

Also, I thought the implicit cast would be useful since it's the only valid way to cast that comparison operation, though maybe it's not a common query pattern. Anyway, I suppose it's always OK to go by default with Postgres compatibility in terms of their actual implementation. I don't know, what do you think?

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)


pkg/sql/sem/tree/type_check.go line 2342 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Oh, right, I think I added the CAST mainly for the non-constant expression case. Their docs indicate TEXT @@ TSVECTOR is a valid comparison, so it's a little confusing that the Postgres docs don't fully match the implementation. I'd almost like to say it's either a docs bug or an actual bug in Postgres. So the question is, do we want to match Postgres docs, or Postgres implementation.

Also, I thought the implicit cast would be useful since it's the only valid way to cast that comparison operation, though maybe it's not a common query pattern. Anyway, I suppose it's always OK to go by default with Postgres compatibility in terms of their actual implementation. I don't know, what do you think?

Never mind, there is actually no rule in the Postgres docs about text @@ tsvector. I'll remove the CAST.

@msirek msirek force-pushed the blathers/backport-release-23.1-99583 branch from 780a942 to febcf89 Compare April 5, 2023 17:41
The TSQuery and TSVector "matches" operator "@@" returns different results
on CRDB vs. Postgres when one of the arguments is a TEXT expression.
The rules at
https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-MATCHING
specify:
> The form text @@ tsquery is equivalent to to_tsvector(x) @@ y.
> The form text @@ text is equivalent to to_tsvector(x) @@ plainto_tsquery(y).

This PR adds these implicit function calls in these "matches" comparison
expressions during type checking.

Fixes #98804

Release note (bug fix): This fixes incorrect results from the text search
@@ ("matches") operator when one of the arguments is a TEXT expression and the
other argument is a TEXT or TSQuery expression.
@msirek msirek force-pushed the blathers/backport-release-23.1-99583 branch from febcf89 to 9d33048 Compare April 5, 2023 17:44
@mgartner
Copy link
Collaborator

mgartner commented Apr 6, 2023

Never mind, there is actually no rule in the Postgres docs about text @@ tsvector. I'll remove the CAST.

Should we instead merge #100704 and backport it after this backport?

Edit: Or merge #100704 and backport them together?

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever you prefer. The current backport already removes the cast, but if we don't want to mix changes, just merge #100704 and then both that and #99583 can be backported together.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @jordanlewis, and @mgartner)

@mgartner
Copy link
Collaborator

mgartner commented Apr 6, 2023

Ok, I've bors'd #100704.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this PR and replacing with #100918

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @jordanlewis, and @mgartner)

@msirek msirek closed this Apr 7, 2023
@mgartner mgartner deleted the blathers/backport-release-23.1-99583 branch April 17, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants