Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sql variable inside query in snowflake and mysql dialect #265

Closed
wants to merge 1 commit into from

Conversation

@eyalleshem eyalleshem force-pushed the snowflake_variable_name branch from a998778 to 51a2627 Compare August 16, 2020 16:22
@coveralls
Copy link

coveralls commented Aug 16, 2020

Pull Request Test Coverage Report for Build 210968877

  • 47 of 47 (100.0%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.08%) to 92.02%

Totals Coverage Status
Change from base Build 205645794: 0.08%
Covered Lines: 4601
Relevant Lines: 5000

💛 - Coveralls

@nickolay
Copy link
Contributor

Sorry for the long delay.

This was previously discussed in #48, where using a custom dialect, in which $ can start an identifier, was deemed a good enough solution.

@alex-dukhno recently noted that a custom dialect would fail all the dialect_of! checks we've started to add, so perhaps it is time to reconsider.

Doing this in the parser results in accepting $ var as a variable, which is weird. I guess you did that to support MySQL's @"quoted identifier" notation, but still it seems like this logic belongs in the tokenizer.

I'd appreciate it if the PR included the logic we're trying to implement, instead of simply a link to the docs. For snowflake it seems that the relevant bits are the following, and we're focused on implementing the first two only?

  • all variables must be prefixed with a $ sign (the documentation does not explain what can follow the dollar sign though..)
  • can be used in Snowflake anywhere a literal constant is allowed
  • Variables can also contain identifier names, such as table names (e.g. SELECT * FROM IDENTIFIER($MY_VARIABLE))

@eyalsatori
Copy link
Contributor

eyalsatori commented Oct 5, 2020

About the custom dialect i think about 2 options :

  1. Maybe it's would be better to use the dialect_of macro only in the parser - and if we need a specific behaviour in the Tokenizer , maybe it will be better to add another function to the "Dialect" trait.
  2. Maybe we could add to the trait some kind of "follow_dialect" function - and change the "dialect_of" macro to return true if it is the current dialect or dialect that following the current dialect .

About the current PR - Do you think it's will be better if we take the whole variable as a single token ? (meanwhile with the current dialect_of macro ..)

About snowflake - i don't think that i want to treat the third case different then the others , I think the "IDENTIFIER" should be parsed as a function , and the value should be expression with kind of sql-variable.

@nickolay
Copy link
Contributor

nickolay commented Oct 6, 2020

About the general points you raised:

  • "use the dialect_of macro [... or ...] add another function to the "Dialect" trait." -- I believe we should use dialect_of! by default to handle differences between the dialects we support directly, and consider other solutions when there's a problem with dialect_of.

    I agree tokenizer will probably end up not using dialect_of! much, but designing an alternative compatible with all the dialects requires more upfront research (that's why I was OK with merging [snowflake] Support single line comments starting with '#' or '//' #264, which used dialect_of in the tokenizer)

  • The follow_dialect idea is somewhat off-topic for this issue, unless you brought it up as another workaround for us not supporting bind variables. If you or someone else need something like follow_dialect specifically, I'd like to discuss this in a separate issue.

About the current PR - Do you think it's will be better if we take the whole variable as a single token ?

This seems more appropriate, yes, given the $ var (with the space between the dollar sign and the variable name) issue I brought up. But again, I'd like us to start with defining the problem we're trying to solve.

For instance implementing the "can be used in Snowflake anywhere a literal constant is allowed" requirement will require rather invasive changes to the parser. Considering other dialects will require more research.

If you want to implement a subset of the Snowflake dialect that recognizes $vars in expression context only for now, that can be achieved relatively easily. This is how I would do it:

  • Define what characters can follow $ in a variable name, and use dialect_of! to conditionally parse it into a Variable token.
  • Parse the Variable token in parse_value to a new Value variant
  • Match on the Variable token along with Number and others unconditionally in parse_prefix.

@alex-dukhno
Copy link
Contributor

Hi @nickolay

developing PostgreSQL protocol compatible database I collect some knowledge around $var for Postgres

  1. $ can follow only by a numbers, starting from 1. Query examples using official doc:
    • prepare n (int2) as insert into numbers values ($abc); gives you ERROR: syntax error at or near "$"
    • after query prepare n (int2) as insert into numbers values ($2);
      • if you try to execute execute n(1); you will see ERROR: wrong number of parameters for prepared statement "n" DETAIL: Expected 2 parameters but got 1.
      • however, if you execute execute n(1, 10); 10 will be inserted into a table
  2. By intuition, it should follow $[parameter_index] pattern where parameter_index is index in values from query like execute n(1, 10);
  3. It can be used in insert queries as variable values e.g. insert into <table_name> values ($1, $2);
  4. It can be used in update queries in SET <column_name>=$1 expressions
  5. It can be used in where, join and having predicates.
  6. I've checked select $1 it also works
  7. It doesn't work with SET <param_name> = e.g. prepare set_stmt as SET extra_float_digits = $1; results into ERROR: syntax error at or near "SET"

I am wondering if based on above info your suggestion:

If you want to implement a subset of the Snowflake dialect that recognizes $vars in expression context only for now, that can be achieved relatively easily. This is how I would do it:

  • Define what characters can follow $ in a variable name, and use dialect_of! to conditionally parse it into a Variable token.
  • Parse the Variable token in parse_value to a new Value variant
  • Match on the Variable token along with Number and others unconditionally in parse_prefix.

could be applied to PostgreSqlDialect?

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @eyalleshem

@andygrove
Copy link
Member

@eyalleshem Could you rebase please so I can merge this?

@alamb
Copy link
Contributor

alamb commented Aug 20, 2021

Hi @eyalleshem -- sorry for the delay in review. I am going to help out now with this repo and we are working to clear the backlog. Is this PR still something you would like to work on to help contribute?

@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

I am closing what look like stale PRs in this repo; I apologize in advance if this is a mistake -- please feel free to reopen if you want to keep working on this issue.

@alamb alamb closed this Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants