-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change SQL dialect to PostgreSQL #214
Conversation
Cool, thanks @returnString . To me it sounds like a good decision moto remove support for the ScalarVariable. I am not sure there is a user for the functionality? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the removal of ScalarVariable
may be a problem.
@@ -916,35 +898,6 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
#[tokio::test] | |||
async fn create_variable_expr() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these were added in apache/arrow#8135 . Perhaps @wqc200 has some comment about how / if this feature is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove support for scalar variables in the logical plan. This seems unrelated to changing the default SQL dialect. Users can use ScalarVariable without using SQL.
@@ -21,7 +21,7 @@ | |||
|
|||
use sqlparser::{ | |||
ast::{ColumnDef, ColumnOptionDef, Statement as SQLStatement, TableConstraint}, | |||
dialect::{keywords::Keyword, Dialect, GenericDialect}, | |||
dialect::{keywords::Keyword, Dialect, PostgreSqlDialect}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach might be to make the SQL dialect be configurable on ExecutionConfig
so that users could choose what dialect they wanted to mimic: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/execution/context.rs#L606
I think @andygrove has said in the past using DataFusion to mimic engines such as MySQL is a good usecase too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting approach - we'd have to be careful about our sqlparser => expr conversions, but that could be quite useful as a feature 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more over lunch: if we enable a bring-your-own-dialect setup, we'd need a decent testing strategy to support this. In this example, @someval
will be parsed as UnaryOp { op: PGAbs, expr: Ident("someval") }
for Postgres, so we'd need to decide how we implement DF-specific parsing overrides, e.g. as used to support this var provider system, on a per-dialect basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does sound fairly complicated.... I personally / project wise don't have a need for a MySQL specific mimic (postgres is good enough for us in IOx) but I think perhaps we should let others weigh in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky for sure. I would be fine with having Postgres as the default and officially supported (well tested) dialect, while also allowing users to provide a dialect at their own risk (and have this be well documented) but even that might create an undue burden on maintainers. It would definitely be good to try and find out more about our user's requirements here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be totally clear, I'm definitely not married to the contents of this PR as they stand; if the consensus is that we do want to support multiple dialects, I'm happy to rework it towards that goal 🙂
If that's a route we want to explore in depth, we could perhaps build some sort of dialect config setup that allows for controlling parser overrides in DF, but that feels a bit too heavyweight if we don't have tonnes of use cases right now (and hence no-one to drive or own that work).
Here's my immediate idea:
- revert the deletion of all the logical plan stuff here, and just retain the parser override deletion
- make the dialect default to Postgres, but be configurable, with a doc warning mentioning the potential risks and edge cases
- mention nullary functions as a replacement for scalar variables in SQL in the next set of release notes
- random bonus thought: maybe expose a unary function to retrieve scalar variables by name from SQL?
Does that sound vaguely sensible?
Full disclosure: my goal here is to get a DataFusion-powered service queryable via BI tooling, and these tools often use the more esoteric features of the Postgres dialect to bootstrap their UIs (think table listings etc). I still have a fair bit of work to do on this, but I can at least sort of see a path to having it working now. I suspect this will be beneficial for lots of other use cases, but admittedly I don't have any evidence for that claim beyond intuition 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the dialect default to Postgres, but be configurable, with a doc warning mentioning the potential risks and edge cases
I think this makes sense
mention nullary functions as a replacement for scalar variables in SQL in the next set of release notes
👍
random bonus thought: maybe expose a unary function to retrieve scalar variables by name from SQL?
This also sounds good -- if you wrote it up as a ticket, I bet it is a good "first issue" type ticket for someone else in the community to do if they were looking for something to learn DataFusion code a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued discussion and progress on this. I think the outcome is good. Just for context, I have built database gateways in the past that mimic MySQL and Hive protocols and dialects (different projects) and although I don't work on any projects these days that need either, I am just trying to keep options open for future users of DataFusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I definitely wouldn't want to impose unnecessary constraints on people who did want to use other dialects :) If people are happy with the ideas in my last comment, I'll close this, get some new issues logged, and start tackling those.
Removing approval as I think that @returnString is reworking the approach
Ping @returnString -- is do you mind if I close this PR (to clean up the PR review queue)? |
Yeah no worries, I need to set aside some time to write up the takeaways from the review thread as a ticket and followup PR, will close for now :) Thanks for chasing this up! |
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Some background first! @Dandandan did the initial work here and identified the VarProvider stuff as a potential followup issue in apache/arrow#9541. This PR just implements the original dialect change with the proposed extra change of removing scalar vars - thanks Daniël! 😄
Which issue does this PR close?
Closes #183.
Rationale for this change
This formalises our decision to use the Postgres SQL dialect.
What changes are included in this PR?
sqlparser-rs
Are there any user-facing changes?
DataFusion (unsure if this worked before in Ballista) no longer supports user or system variables exposed with the non-standard
@
prefix.