Change SQL dialect to PostgreSQL #214

returnString · 2021-04-27T20:38:15Z

Some background first! @Dandandan did the initial work here and identified the VarProvider stuff as a potential followup issue in apache/arrow#9541. This PR just implements the original dialect change with the proposed extra change of removing scalar vars - thanks Daniël! 😄

Which issue does this PR close?

Closes #183.

Rationale for this change

This formalises our decision to use the Postgres SQL dialect.

What changes are included in this PR?

The DataFusion SQL parser now uses the Postgres dialect from sqlparser-rs
The notion of ScalarVariable exprs and the related VarProvider concept have been removed entirely, as these aren't supported in the Postgres dialect and the use case can be handled with nullary functions as is done in Postgres.

Are there any user-facing changes?

DataFusion (unsure if this worked before in Ballista) no longer supports user or system variables exposed with the non-standard @ prefix.

Dandandan · 2021-04-27T20:46:56Z

Cool, thanks @returnString . To me it sounds like a good decision moto remove support for the ScalarVariable. I am not sure there is a user for the functionality?
What do you think @alamb @andygrove

alamb

I think the removal of ScalarVariable may be a problem.

alamb · 2021-04-28T10:31:49Z

datafusion/src/execution/context.rs

@@ -916,35 +898,6 @@ mod tests {
        Ok(())
    }

-    #[tokio::test]
-    async fn create_variable_expr() -> Result<()> {


I think these were added in apache/arrow#8135 . Perhaps @wqc200 has some comment about how / if this feature is used?

I don't think we should remove support for scalar variables in the logical plan. This seems unrelated to changing the default SQL dialect. Users can use ScalarVariable without using SQL.

alamb · 2021-04-28T10:34:38Z

datafusion/src/sql/parser.rs

@@ -21,7 +21,7 @@

 use sqlparser::{
    ast::{ColumnDef, ColumnOptionDef, Statement as SQLStatement, TableConstraint},
-    dialect::{keywords::Keyword, Dialect, GenericDialect},
+    dialect::{keywords::Keyword, Dialect, PostgreSqlDialect},


Another approach might be to make the SQL dialect be configurable on ExecutionConfig so that users could choose what dialect they wanted to mimic: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/execution/context.rs#L606

I think @andygrove has said in the past using DataFusion to mimic engines such as MySQL is a good usecase too

That's an interesting approach - we'd have to be careful about our sqlparser => expr conversions, but that could be quite useful as a feature 🤔

Thinking about it more over lunch: if we enable a bring-your-own-dialect setup, we'd need a decent testing strategy to support this. In this example, @someval will be parsed as UnaryOp { op: PGAbs, expr: Ident("someval") } for Postgres, so we'd need to decide how we implement DF-specific parsing overrides, e.g. as used to support this var provider system, on a per-dialect basis.

That does sound fairly complicated.... I personally / project wise don't have a need for a MySQL specific mimic (postgres is good enough for us in IOx) but I think perhaps we should let others weigh in here

This is tricky for sure. I would be fine with having Postgres as the default and officially supported (well tested) dialect, while also allowing users to provide a dialect at their own risk (and have this be well documented) but even that might create an undue burden on maintainers. It would definitely be good to try and find out more about our user's requirements here.

Just to be totally clear, I'm definitely not married to the contents of this PR as they stand; if the consensus is that we do want to support multiple dialects, I'm happy to rework it towards that goal 🙂

If that's a route we want to explore in depth, we could perhaps build some sort of dialect config setup that allows for controlling parser overrides in DF, but that feels a bit too heavyweight if we don't have tonnes of use cases right now (and hence no-one to drive or own that work).

Here's my immediate idea:

revert the deletion of all the logical plan stuff here, and just retain the parser override deletion

make the dialect default to Postgres, but be configurable, with a doc warning mentioning the potential risks and edge cases

mention nullary functions as a replacement for scalar variables in SQL in the next set of release notes

random bonus thought: maybe expose a unary function to retrieve scalar variables by name from SQL?

Does that sound vaguely sensible?

Full disclosure: my goal here is to get a DataFusion-powered service queryable via BI tooling, and these tools often use the more esoteric features of the Postgres dialect to bootstrap their UIs (think table listings etc). I still have a fair bit of work to do on this, but I can at least sort of see a path to having it working now. I suspect this will be beneficial for lots of other use cases, but admittedly I don't have any evidence for that claim beyond intuition 😅

make the dialect default to Postgres, but be configurable, with a doc warning mentioning the potential risks and edge cases

I think this makes sense

mention nullary functions as a replacement for scalar variables in SQL in the next set of release notes

👍

random bonus thought: maybe expose a unary function to retrieve scalar variables by name from SQL?

This also sounds good -- if you wrote it up as a ticket, I bet it is a good "first issue" type ticket for someone else in the community to do if they were looking for something to learn DataFusion code a bit

Thanks for the continued discussion and progress on this. I think the outcome is good. Just for context, I have built database gateways in the past that mimic MySQL and Hive protocols and dialects (different projects) and although I don't work on any projects these days that need either, I am just trying to keep options open for future users of DataFusion.

Yeah, I definitely wouldn't want to impose unnecessary constraints on people who did want to use other dialects :) If people are happy with the ideas in my last comment, I'll close this, get some new issues logged, and start tackling those.

@returnString

Removing approval as I think that @returnString is reworking the approach

alamb · 2021-06-02T20:35:55Z

Ping @returnString -- is do you mind if I close this PR (to clean up the PR review queue)?

returnString · 2021-06-02T20:38:31Z

Yeah no worries, I need to set aside some time to write up the takeaways from the review thread as a ticket and followup PR, will close for now :) Thanks for chasing this up!

Co-authored-by: Dan Harris <[email protected]>

returnString added 2 commits April 27, 2021 21:25

Remove ScalarVariable expr and related concepts

8fc3511

Use Postgres dialect for SQL parsing

2c59fb0

Dandandan previously approved these changes Apr 28, 2021

View reviewed changes

jorgecarleitao requested review from alamb and andygrove April 28, 2021 10:04

alamb reviewed Apr 28, 2021

View reviewed changes

alamb marked this pull request as draft May 6, 2021 18:13

returnString closed this Jun 2, 2021

Dandandan pushed a commit to Dandandan/arrow-datafusion that referenced this pull request Sep 29, 2023

Fix bug in simplify expressions (apache#214)

5b5c103

Dandandan added a commit that referenced this pull request Sep 29, 2023

Fix bug in simplify expressions (#214) (#7699)

2abacf4

Co-authored-by: Dan Harris <[email protected]>

Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Oct 7, 2023

Fix bug in simplify expressions (apache#214) (apache#7699)

ec2fc6f

Co-authored-by: Dan Harris <[email protected]>

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change SQL dialect to PostgreSQL #214

Change SQL dialect to PostgreSQL #214

returnString commented Apr 27, 2021 •

edited

Loading

Dandandan commented Apr 27, 2021

alamb left a comment

alamb Apr 28, 2021

andygrove Apr 28, 2021

alamb Apr 28, 2021

returnString Apr 28, 2021 •

edited

Loading

returnString Apr 28, 2021

alamb Apr 28, 2021

andygrove Apr 28, 2021

returnString Apr 28, 2021 •

edited

Loading

alamb Apr 29, 2021

andygrove Apr 29, 2021

returnString Apr 29, 2021

alamb commented Jun 2, 2021

returnString commented Jun 2, 2021

Change SQL dialect to PostgreSQL #214

Change SQL dialect to PostgreSQL #214

Conversation

returnString commented Apr 27, 2021 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Dandandan commented Apr 27, 2021

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

returnString Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

returnString Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jun 2, 2021

returnString commented Jun 2, 2021

returnString commented Apr 27, 2021 •

edited

Loading

returnString Apr 28, 2021 •

edited

Loading

returnString Apr 28, 2021 •

edited

Loading