Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unary '+' operators for strings #3013

Conversation

robverschoor
Copy link
Contributor

@robverschoor robverschoor commented Oct 9, 2024

Description

The T-SQL grammar allows an arbitrary number of unary + operators to precede an expression,
but PG only supports that for numeric expressions. For string expressions, the + will be parsed fine by the T-SQL ANTLR parser, but these will raise an error in PG.
In SQL those unary + string operators may look like they are additional concatenation operators, like: SELECT 'a' ++ 'b'. Expressions such as +++(+++@v)) are also valid syntax according to the T-SQL grammar even though they look unusual.
In this fix we remove such unary + operators, which are redundant anyway.
However we do not touch numeric constants (e.g. +123) since the +, though still redundant, may
have been included for code clarity (e.g. +123 as opposed to -123).

Signed-off-by: Rob Verschoor [email protected]

Issues Resolved

BABEL-1924 Support unary '+' operator for string expressions

Test Scenarios Covered

  • Use case based - Yes

  • Boundary conditions - N/A

  • Arbitrary inputs - N/A

  • Negative test cases - Yes

  • Minor version upgrade tests - N/A

  • Major version upgrade tests - N/A

  • Performance tests - N/A

  • Tooling impact - N/A

  • Client tests - N/A

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coveralls
Copy link
Collaborator

coveralls commented Oct 9, 2024

Pull Request Test Coverage Report for Build 11307572895

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 30 of 30 (100.0%) changed or added relevant lines in 1 file are covered.
  • 135 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 74.497%

Files with Coverage Reduction New Missed Lines %
contrib/babelfishpg_common/src/typecode.c 2 96.57%
contrib/babelfishpg_tsql/src/pltsql_coerce.c 133 82.07%
Totals Coverage Status
Change from base Build 11256391296: 0.02%
Covered Lines: 45184
Relevant Lines: 60652

💛 - Coveralls

*/
std::string op = getFullText(ctx->op);
if (op.front() == '+') {
auto rhsctx = ctx->expression();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does rhs stands for in this context ?

Copy link
Contributor Author

@robverschoor robverschoor Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RHS=right-hand side ; LHS=left-hand side (of an expression/equation)
See https://en.wikipedia.org/wiki/Sides_of_an_equation

* In SQL this shows as redundant operators, for example for concatenation: SELECT 'a' ++ 'b'. Expressions
* such as +++(+++@v)) are also valid syntax according to the T-SQL grammar even though they look unusual.
* Here we remove such unary '+' operators, which are redundant anyway.
* However we do not touch numeric constants (e.g. +123) since the '+', although still redundant, may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mentioned we do not touch numeric constants, how about the numeric variables ?

Copy link
Contributor Author

@robverschoor robverschoor Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All expressions, including numeric ones (except numeric constants), have their unary + removed. If it is a unary operator, then the + is always redundant. This applies to variables, bracketed expressions as well as subqueries (see the test cases).
The reason we don't want to touch +123 is that this is a commonly known and meaningful notation by itself, which you learn in school, just like -123; +@v in contrast, is not. Arguably, there is a subjective element to this reasoning but the idea is to avoid modifying such familiar-looking numeric constants in case they show up somewhere, like in captured SQL code.

@forestkeeper
Copy link
Contributor

Does this change only work for string constant ? What about string variables ?

Comment on lines +3247 to +3253
(rhs.front() == '\'') || // single-quoted strings
(rhs.front() == '"') || // both double-quoted strings and double-quoted identifiers
(rhs.front() == '@') || // variables
(rhs.front() == '(') || // bracketed expressions
(rhs.front() == '[') || // bracket-delimited identifiers
(rhs.front() == '_') || // identifiers starting with an underscore
std::isalpha(rhs.front()) // identifiers as well as the N'...' string notation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we trying to do here? Why only these characters are considered?

Copy link
Contributor Author

@robverschoor robverschoor Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These characters match anything except a numeric constant. Technically +(123) is also a numeric constant in T-SQL grammar terms, but at school we learned to interpret this as (+1)*(123) so removing the + is fine here, per the philosophy as outlined in the earlier comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can't we check only if the next char is numeric? Something like this:

if (rhs.front() == '+')  {
   // do something
}
else if (!std::isdigit(rhs.front())) {
    stream.setText(ctx->op->getStartIndex(), " ");	
    break;
}
break;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would also have to be a test on the first char being '.' (e.g. +.5).
The advantage of the current coding explains what sort of expression we're removing the '+' from, which I find more helpful. But that's partly a matter of taste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my point was to make the conditions a little simpler but anyways if it works this way then I am fine with it as well.

@robverschoor
Copy link
Contributor Author

Does this change only work for string constant ? What about string variables ?

Also see comment above.
The rewrite is applied to all unary + operators, except for numeric constants. We can recognize numeric constants because basically they start with a digit. At the ANTLR stage , we cannot perform datatype resolution so we cannot determine what datatype a variable is.

Comment on lines +3247 to +3253
(rhs.front() == '\'') || // single-quoted strings
(rhs.front() == '"') || // both double-quoted strings and double-quoted identifiers
(rhs.front() == '@') || // variables
(rhs.front() == '(') || // bracketed expressions
(rhs.front() == '[') || // bracket-delimited identifiers
(rhs.front() == '_') || // identifiers starting with an underscore
std::isalpha(rhs.front()) // identifiers as well as the N'...' string notation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my point was to make the conditions a little simpler but anyways if it works this way then I am fine with it as well.

@forestkeeper forestkeeper merged commit 8ad86ff into babelfish-for-postgresql:BABEL_4_X_DEV Oct 14, 2024
46 checks passed
@robverschoor robverschoor deleted the unary_plus_op_string branch October 14, 2024 16:01
robverschoor added a commit to robverschoor/babelfish_extensions that referenced this pull request Dec 18, 2024
rishabhtanwar29 pushed a commit that referenced this pull request Dec 18, 2024
This reverts commit 8ad86ff.
[PR 3013](#3013) incorrectly ended up in BABEL_4_4_STABLE.

Signed-off-by: Rob Verschoor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants