Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added parsing for PostgreSQL operations #267

Merged
merged 4 commits into from
Sep 30, 2020

Conversation

alex-dukhno
Copy link
Contributor

Added support to parse mathematical operations for PostgreSQL
See table 9.4

@coveralls
Copy link

coveralls commented Aug 30, 2020

Pull Request Test Coverage Report for Build 278078999

  • 80 of 90 (88.89%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 91.872%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser.rs 24 26 92.31%
tests/sqlparser_postgres.rs 25 28 89.29%
src/tokenizer.rs 17 22 77.27%
Totals Coverage Status
Change from base Build 242635691: -0.04%
Covered Lines: 4668
Relevant Lines: 5081

💛 - Coveralls

@alex-dukhno
Copy link
Contributor Author

@nickolay can you please take a look?

@nickolay
Copy link
Contributor

nickolay commented Sep 7, 2020

This seems good. Could you fix the conflicts with the recent merge, and use the dialect_of! macro to limit the new operations to the PostgreSQL dialect?

@alex-dukhno alex-dukhno force-pushed the PG-specific-operations branch from 5e10970 to 0df8011 Compare September 7, 2020 15:25
@alex-dukhno
Copy link
Contributor Author

@nickolay can you have a look?
thank you

src/tokenizer.rs Outdated
@@ -406,7 +434,18 @@ impl<'a> Tokenizer<'a> {
'|' => {
chars.next(); // consume the '|'
match chars.peek() {
Some('|') => self.consume_and_return(chars, Token::StringConcat),
Some('/') if dialect_of!(self is PostgreSqlDialect) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickolay, am I right that this is not going to work with custom dialect as you suggest here #243 (comment)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I don't follow. In the comment you mention I said I didn't think that simply allowing $ to start an identifier matched what the PosgreSQL documentation said about this.

In this PR you make |/ conditionally parse as SquareRoot, which makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have custom Dialect https://github.com/alex-dukhno/database/blob/master/src/parser/src/lib.rs#L221-L231, is it true that it wouldn't parse |/ into SquareRoot because of if dialect_of!(self is PostgreSqlDialect)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now. Yes, I assumed |/ is something very Postgres-specific, so limiting it to PostgreSqlDialect (and GenericDialect, which I didn't mention explicitly) seemed to make sense.

In the older comment you're referring to I described a stop-gap measure that would be used until someone figured out the proper way to deal with $... in the parser.

We could make the new operators parse unconditionally for now if that makes your life easier, but without a clear use-case (and accompanying testcase) this will likely break in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have dialect_of checks and have a solution for dealing with $.... Having my knowledge of crate I'd suggest to add Variable(Ident) or something similar to ast::Expr enum. It has to be done in a separate PR of course. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have @-mentioned you in #265, where this discussion can continue.

The hard part is describing what specifically we're trying to implement and why. Your suggestion, for example, solves the problem of representing $ (but not another leader) followed by an identifier (quoted or not; consisting of characters from an unspecified set) anywhere an arbitrary expression is allowed (but not in more restricted contexts, like OFFSET __ ROWS). I'm not sure if this matches Postgres or the "common denominator" of the usual dialects; answering that requires some research. From the examples I've seen so far, a new Value variant with {leader, ident} fields might be more appropriate.

Copy link
Contributor

@nickolay nickolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for taking so long!

src/tokenizer.rs Outdated
@@ -406,7 +434,18 @@ impl<'a> Tokenizer<'a> {
'|' => {
chars.next(); // consume the '|'
match chars.peek() {
Some('|') => self.consume_and_return(chars, Token::StringConcat),
Some('/') if dialect_of!(self is PostgreSqlDialect) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I don't follow. In the comment you mention I said I didn't think that simply allowing $ to start an identifier matched what the PosgreSQL documentation said about this.

In this PR you make |/ conditionally parse as SquareRoot, which makes sense to me.

src/ast/operator.rs Outdated Show resolved Hide resolved
tests/sqlparser_postgres.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/operator.rs Outdated Show resolved Hide resolved
src/parser.rs Outdated
Comment on lines 302 to 304
Token::CubeRoot => Ok(Expr::UnaryOp {
op: UnaryOperator::PGCbrt,
expr: Box::new(self.parse_subexpr(0)?),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not right, as this parses WHERE ||/27 = 3 as ||/ ( 27=3 ), which is not a valid expression. I don't see precedence mentioned in the documentation you linked to, and I think a better guess would be to default all the unary operators to the PLUS_MINUS_PREC (moving the new ops to the tok @ case below).

src/parser.rs Outdated Show resolved Hide resolved
src/tokenizer.rs Outdated Show resolved Hide resolved
tests/sqlparser_postgres.rs Outdated Show resolved Hide resolved
tests/sqlparser_postgres.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@nickolay nickolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to merge this, waiting for your decision on removing the dialect-specific parsing.

src/parser.rs Outdated Show resolved Hide resolved
Comment on lines 736 to +740
} else if Token::DoubleColon == tok {
self.parse_pg_cast(expr)
} else if Token::ExclamationMark == tok {
// PostgreSQL factorial operation
Ok(Expr::UnaryOp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd consider how postfix factorial interacted with other operators (the postfix ::type, another factorial, and prefix operators), but I don't think this has to block merging this PR.

@nickolay nickolay merged commit 926b03a into apache:main Sep 30, 2020
@nickolay
Copy link
Contributor

Excellent, thank you!

@alex-dukhno alex-dukhno deleted the PG-specific-operations branch September 30, 2020 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants