Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573

goldmedal · 2024-05-19T02:10:49Z

Which issue does this PR close?

Closes #10557

Rationale for this change

What changes are included in this PR?

Only implement the default dialect in this PR. We need other follow-up PR for other dialects.

Are these changes tested?

Yes

Are there any user-facing changes?

No

comphead · 2024-05-20T17:28:46Z

datafusion/sql/Cargo.toml

@@ -47,6 +47,7 @@ arrow-schema = { workspace = true }
 datafusion-common = { workspace = true, default-features = true }
 datafusion-expr = { workspace = true }
 log = { workspace = true }
+regex = { version = "1.8" }


I think we need to move regex to top level, as it is used in much of packages. It can be done as followup

I moved it in 32aa0e9.

comphead · 2024-05-20T17:40:23Z

datafusion/sql/src/unparser/expr.rs

@@ -504,6 +508,14 @@ impl Unparser<'_> {
            .collect::<Result<Vec<_>>>()
    }

+    pub(super) fn new_ident_quoted_if_needs(&self, ident: String) -> ast::Ident {


Please add a method comments for a pub method

I added some comments in 7a534fb.

comphead

Thanks @goldmedal
I'm thinking how this will work with whitespaces columns like

select 1 as "a a";

goldmedal · 2024-05-21T11:54:48Z

Thanks @goldmedal I'm thinking how this will work with whitespaces columns like
select 1 as "a a";

Thanks @comphead :)
I'm not sure what you mean but I think it also works like other illegal char for SQL identifiers. I add a test case for it in 44e9baa.

alamb

Thank you @goldmedal -- I think this looks really nice

Thank you for the reviews @comphead

I left some suggestions for improvement but I think they could be done as follow on PRs as well.

cc @phillipleblanc and @devinjdangelo and @backkem

alamb · 2024-05-21T20:41:50Z

datafusion-examples/examples/plan_to_sql.rs

@@ -52,7 +52,7 @@ fn simple_expr_to_sql_demo() -> Result<()> {
    let expr = col("a").lt(lit(5)).or(col("a").eq(lit(8)));
    let ast = expr_to_sql(&expr)?;
    let sql = format!("{}", ast);
-    assert_eq!(sql, r#"(("a" < 5) OR ("a" = 8))"#);
+    assert_eq!(sql, r#"((a < 5) OR (a = 8))"#);


Given this change, perhaps we can remove the next example in the file simple_expr_to_sql_demo_no_escape as I don't think it serves any purpose

alamb · 2024-05-21T20:42:24Z

datafusion/core/Cargo.toml

@@ -145,7 +145,7 @@ postgres-protocol = "0.6.4"
 postgres-types = { version = "0.2.4", features = ["derive", "with-chrono-0_4"] }
 rand = { workspace = true, features = ["small_rng"] }
 rand_distr = "0.4.3"
-regex = "1.5.4"
+regex = { workspace = true }


that is certainly nice to use the same version of regex everywhere 👍

alamb · 2024-05-21T20:45:36Z

datafusion/sql/src/unparser/dialect.rs

@@ -15,19 +15,30 @@
 // specific language governing permissions and limitations
 // under the License.

+use regex::Regex;
+use sqlparser::keywords::ALL_KEYWORDS;
+
 /// Dialect is used to capture dialect specific syntax.


Suggested change

/// Dialect is used to capture dialect specific syntax.

/// `Dialect` to usse for Unparsing

///

/// The default dialect tries to avoid quoting identifiers unless necessary (e.g. `a` instead of `"a"`)

/// but this behavior can be overridden as needed

Thanks. Look nice.

goldmedal · 2024-05-22T05:16:22Z

Thank you @goldmedal -- I think this looks really nice

Thank you for the reviews @comphead

I left some suggestions for improvement but I think they could be done as follow on PRs as well.

cc @phillipleblanc and @devinjdangelo and @backkem

Thanks @alamb !
I think the suggestions is very simple and reasonable. So, I just fixed them in this PR quickly.

phillipleblanc

Awesome! Thanks @goldmedal 🥇

phillipleblanc · 2024-05-22T05:24:10Z

datafusion/sql/src/unparser/dialect.rs

+use regex::Regex;
+use sqlparser::keywords::ALL_KEYWORDS;
+
+/// `Dialect` to usse for Unparsing


Suggested change

/// `Dialect` to usse for Unparsing

/// `Dialect` to use for Unparsing

Thanks @phillipleblanc

lewiszlw · 2024-05-22T08:41:55Z

datafusion/sql/src/unparser/dialect.rs

 /// Note: this trait will eventually be replaced by the Dialect in the SQLparser package
 ///
 /// See <https://github.com/sqlparser-rs/sqlparser-rs/pull/1170>
 pub trait Dialect {
    fn identifier_quote_style(&self) -> Option<char>;
+    fn identifier_needs_quote(&self, _: &str) -> bool {


Above note said this trait will eventually be replaced by the Dialect in the SQLparser package. Seems this pr make this harder. Should we extend sqlparser Dialect using something like DialectExt trait?

I wanted to note that this functionality could also be covered within the existing SQLparser::Dialect::identifier_quote_style. It's signature looks as follows:

identifier_quote_style(&self, _identifier: &str) -> Option<char>

It is passed the identifier and can optionally return a quote character if needed. This way the trait doesn't need extending at all. See also apache/datafusion-sqlparser-rs#1170.

@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per @backkem 's suggestion in a follow on PR, or would you like to update this PR?

@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per
@backkem 's suggestion in a follow on PR, or would you like to update this PR?

Thanks @lewiszlw @backkem @alamb
I think I have time to fix it now. I can fix it in this PR.

backkem

LGTM with one small nit.

backkem · 2024-05-22T12:15:35Z

datafusion/sql/src/unparser/dialect.rs

 }
 pub struct DefaultDialect {}

 impl Dialect for DefaultDialect {
-    fn identifier_quote_style(&self) -> Option<char> {
-        Some('"')
+    fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {


Suggested change

fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {

fn identifier_quote_style(&self, identifier: &str) -> Option<char> {

@backkem I want to check if I should also change the signature (L29) in the Dialect trait. I'm not familiar with naming conventions in Rust. I guess _identifier means this parameter is an identifier, but we ignore it in this method, right?

Indeed, prefixing the identifier with an _ is a convention for silencing a linter warning that the variable is unused. Since it is being used now, the _ prefix is no longer needed.

alamb

Thank you again @goldmedal and @backkem and @phillipleblanc and @lewiszlw and @comphead -- I think this PR looks really nice now and this makes unparsing much nicer looking for humans 🏆

goldmedal · 2024-05-23T00:13:53Z

Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :)

Omega359 · 2024-05-24T13:27:01Z

This shouldn't have passed checks.

+ cargo fmt --all -- --check
`cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/datafusion/datafusion/core`
referenced by workspace at `/opt/dev/datafusion/Cargo.toml`

Caused by:
  failed to load manifest for dependency `datafusion-functions`

Caused by:
  failed to parse manifest at `/opt/dev/datafusion/datafusion/functions/Cargo.toml`

Caused by:
  dependency (regex) specified without providing a local path, Git repository, version, or workspace dependency to use

functions/Cargo.toml

regex = { worksapce = true, optional = true }

alamb · 2024-05-25T12:13:48Z

This shouldn't have passed checks.

+ cargo fmt --all -- --check
`cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/datafusion/datafusion/core`
referenced by workspace at `/opt/dev/datafusion/Cargo.toml`

Caused by:
  failed to load manifest for dependency `datafusion-functions`

Caused by:
  failed to parse manifest at `/opt/dev/datafusion/datafusion/functions/Cargo.toml`

Caused by:
  dependency (regex) specified without providing a local path, Git repository, version, or workspace dependency to use

functions/Cargo.toml

regex = { worksapce = true, optional = true }

Yeah, I don't know why that is a warning and not an error -- here is a PR to fix it: #10662

… without quotes (apache#10573) * add ident needs quote check * implement the check for default dialect and fix tests * add test for need-quoted cases * update cargo lock * fomrat cargo toml * fix the example test * move regex to top level * add comments for new_ident_quoted_if_needs func * fix typo and add test for space * fix example test * fix example test * fix the test fail * remove unused example and modified comments * fix typo * follow the latest Dialect trait in sqlparser * fix the parameter name

github-actions bot added the sql SQL Planner label May 19, 2024

goldmedal mentioned this pull request May 19, 2024

Make SQL strings generated from Exprs "prettier" #10557

Closed

comphead reviewed May 20, 2024

View reviewed changes

goldmedal force-pushed the feature/10557-dialect-need-qutoed branch from 4acde31 to 44e9baa Compare May 21, 2024 11:49

github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels May 21, 2024

alamb approved these changes May 21, 2024

View reviewed changes

phillipleblanc approved these changes May 22, 2024

View reviewed changes

lewiszlw reviewed May 22, 2024

View reviewed changes

backkem approved these changes May 22, 2024

View reviewed changes

goldmedal added 16 commits May 22, 2024 23:31

add ident needs quote check

e13d7dc

implement the check for default dialect and fix tests

bce7e41

add test for need-quoted cases

a616719

update cargo lock

b8e7dbe

fomrat cargo toml

0293ca7

fix the example test

4063d5d

move regex to top level

c0e03d1

add comments for new_ident_quoted_if_needs func

7430634

fix typo and add test for space

a881a65

fix example test

c9eb4a4

fix example test

2eba717

fix the test fail

dc75c2d

remove unused example and modified comments

603d0b4

fix typo

3a9125c

follow the latest Dialect trait in sqlparser

9a1d05c

fix the parameter name

8ed1525

goldmedal force-pushed the feature/10557-dialect-need-qutoed branch from 654c836 to 8ed1525 Compare May 22, 2024 15:38

alamb approved these changes May 22, 2024

View reviewed changes

alamb merged commit 7bd4b53 into apache:main May 22, 2024
25 checks passed

goldmedal deleted the feature/10557-dialect-need-qutoed branch May 23, 2024 00:13

alamb mentioned this pull request May 23, 2024

Make SQL strings generated from Exprs even "prettier" #10633

Closed

alamb mentioned this pull request May 25, 2024

Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) #10662

Merged

alamb mentioned this pull request May 28, 2024

DataFusion weekly project plan (Andrew Lamb) - May 27, 2024 #10699

Closed

9 tasks

goldmedal mentioned this pull request Jun 24, 2024

Introduce the calculation for TO_MANY relationship Canner/wren-engine#626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573

Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573

goldmedal commented May 19, 2024

comphead May 20, 2024

goldmedal May 21, 2024

comphead May 20, 2024

goldmedal May 21, 2024

comphead left a comment

goldmedal commented May 21, 2024

alamb left a comment

alamb May 21, 2024

alamb May 21, 2024

alamb May 21, 2024

alamb May 21, 2024

goldmedal May 22, 2024

goldmedal commented May 22, 2024

phillipleblanc left a comment

phillipleblanc May 22, 2024

goldmedal May 22, 2024

lewiszlw May 22, 2024

backkem May 22, 2024 •

edited

Loading

alamb May 22, 2024

goldmedal May 22, 2024 •

edited

Loading

backkem left a comment

backkem May 22, 2024

goldmedal May 22, 2024

backkem May 22, 2024

goldmedal May 22, 2024

alamb left a comment •

edited

Loading

goldmedal commented May 23, 2024

Omega359 commented May 24, 2024

alamb commented May 25, 2024

	/// `Dialect` to usse for Unparsing
	/// `Dialect` to use for Unparsing

	fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {
	fn identifier_quote_style(&self, identifier: &str) -> Option<char> {

Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573

Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573

Conversation

goldmedal commented May 19, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

goldmedal commented May 21, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldmedal commented May 22, 2024

phillipleblanc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

backkem May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldmedal May 22, 2024 • edited Loading

Choose a reason for hiding this comment

backkem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment • edited Loading

Choose a reason for hiding this comment

goldmedal commented May 23, 2024

Omega359 commented May 24, 2024

alamb commented May 25, 2024

backkem May 22, 2024 •

edited

Loading

goldmedal May 22, 2024 •

edited

Loading

alamb left a comment •

edited

Loading