-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573
Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573
Conversation
datafusion/sql/Cargo.toml
Outdated
@@ -47,6 +47,7 @@ arrow-schema = { workspace = true } | |||
datafusion-common = { workspace = true, default-features = true } | |||
datafusion-expr = { workspace = true } | |||
log = { workspace = true } | |||
regex = { version = "1.8" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to move regex to top level, as it is used in much of packages. It can be done as followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it in 32aa0e9.
@@ -504,6 +508,14 @@ impl Unparser<'_> { | |||
.collect::<Result<Vec<_>>>() | |||
} | |||
|
|||
pub(super) fn new_ident_quoted_if_needs(&self, ident: String) -> ast::Ident { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a method comments for a pub method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments in 7a534fb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goldmedal
I'm thinking how this will work with whitespaces columns like
select 1 as "a a";
4acde31
to
44e9baa
Compare
Thanks @comphead :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @goldmedal -- I think this looks really nice
Thank you for the reviews @comphead
I left some suggestions for improvement but I think they could be done as follow on PRs as well.
cc @phillipleblanc and @devinjdangelo and @backkem
@@ -52,7 +52,7 @@ fn simple_expr_to_sql_demo() -> Result<()> { | |||
let expr = col("a").lt(lit(5)).or(col("a").eq(lit(8))); | |||
let ast = expr_to_sql(&expr)?; | |||
let sql = format!("{}", ast); | |||
assert_eq!(sql, r#"(("a" < 5) OR ("a" = 8))"#); | |||
assert_eq!(sql, r#"((a < 5) OR (a = 8))"#); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this change, perhaps we can remove the next example in the file simple_expr_to_sql_demo_no_escape
as I don't think it serves any purpose
@@ -145,7 +145,7 @@ postgres-protocol = "0.6.4" | |||
postgres-types = { version = "0.2.4", features = ["derive", "with-chrono-0_4"] } | |||
rand = { workspace = true, features = ["small_rng"] } | |||
rand_distr = "0.4.3" | |||
regex = "1.5.4" | |||
regex = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is certainly nice to use the same version of regex everywhere 👍
@@ -15,19 +15,30 @@ | |||
// specific language governing permissions and limitations | |||
// under the License. | |||
|
|||
use regex::Regex; | |||
use sqlparser::keywords::ALL_KEYWORDS; | |||
|
|||
/// Dialect is used to capture dialect specific syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Dialect is used to capture dialect specific syntax. | |
/// `Dialect` to usse for Unparsing | |
/// | |
/// The default dialect tries to avoid quoting identifiers unless necessary (e.g. `a` instead of `"a"`) | |
/// but this behavior can be overridden as needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Look nice.
Thanks @alamb ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks @goldmedal 🥇
use regex::Regex; | ||
use sqlparser::keywords::ALL_KEYWORDS; | ||
|
||
/// `Dialect` to usse for Unparsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `Dialect` to usse for Unparsing | |
/// `Dialect` to use for Unparsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @phillipleblanc
/// Note: this trait will eventually be replaced by the Dialect in the SQLparser package | ||
/// | ||
/// See <https://github.com/sqlparser-rs/sqlparser-rs/pull/1170> | ||
pub trait Dialect { | ||
fn identifier_quote_style(&self) -> Option<char>; | ||
fn identifier_needs_quote(&self, _: &str) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above note said this trait will eventually be replaced by the Dialect in the SQLparser package. Seems this pr make this harder. Should we extend sqlparser Dialect
using something like DialectExt
trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to note that this functionality could also be covered within the existing SQLparser::Dialect::identifier_quote_style
. It's signature looks as follows:
identifier_quote_style(&self, _identifier: &str) -> Option<char>
It is passed the identifier
and can optionally return a quote character if needed. This way the trait doesn't need extending at all. See also apache/datafusion-sqlparser-rs#1170.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per @backkem 's suggestion in a follow on PR, or would you like to update this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per
@backkem 's suggestion in a follow on PR, or would you like to update this PR?
Thanks @lewiszlw @backkem @alamb
I think I have time to fix it now. I can fix it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one small nit.
} | ||
pub struct DefaultDialect {} | ||
|
||
impl Dialect for DefaultDialect { | ||
fn identifier_quote_style(&self) -> Option<char> { | ||
Some('"') | ||
fn identifier_quote_style(&self, _identifier: &str) -> Option<char> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn identifier_quote_style(&self, _identifier: &str) -> Option<char> { | |
fn identifier_quote_style(&self, identifier: &str) -> Option<char> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@backkem I want to check if I should also change the signature (L29) in the Dialect trait. I'm not familiar with naming conventions in Rust. I guess _identifier means this parameter is an identifier, but we ignore it in this method, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, prefixing the identifier with an _
is a convention for silencing a linter warning that the variable is unused. Since it is being used now, the _
prefix is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
654c836
to
8ed1525
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again @goldmedal and @backkem and @phillipleblanc and @lewiszlw and @comphead -- I think this PR looks really nice now and this makes unparsing much nicer looking for humans 🏆
Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :) |
This shouldn't have passed checks.
functions/Cargo.toml
|
Yeah, I don't know why that is a warning and not an error -- here is a PR to fix it: #10662 |
… without quotes (apache#10573) * add ident needs quote check * implement the check for default dialect and fix tests * add test for need-quoted cases * update cargo lock * fomrat cargo toml * fix the example test * move regex to top level * add comments for new_ident_quoted_if_needs func * fix typo and add test for space * fix example test * fix example test * fix the test fail * remove unused example and modified comments * fix typo * follow the latest Dialect trait in sqlparser * fix the parameter name
Which issue does this PR close?
Closes #10557
Rationale for this change
What changes are included in this PR?
Only implement the default dialect in this PR. We need other follow-up PR for other dialects.
Are these changes tested?
Yes
Are there any user-facing changes?
No