-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DuckDB struct syntax and support list of struct syntax #1372
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Pull Request Test Coverage Report for Build 10404149988Details
💛 - Coveralls |
Signed-off-by: jayzhan211 <[email protected]>
src/parser/mod.rs
Outdated
@@ -1074,10 +1074,14 @@ impl<'a> Parser<'a> { | |||
Keyword::MATCH if dialect_of!(self is MySqlDialect | GenericDialect) => { | |||
self.parse_match_against() | |||
} | |||
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect) => { | |||
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this line?
src/parser/mod.rs
Outdated
self.prev_token(); | ||
self.parse_bigquery_struct_literal() | ||
} | ||
Keyword::STRUCT if dialect_of!(self is DuckDbDialect ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keyword::STRUCT if dialect_of!(self is DuckDbDialect ) => { | |
Keyword::STRUCT if dialect_of!(self is DuckDbDialect) => { |
src/parser/mod.rs
Outdated
match self.peek_token().token { | ||
Token::RParen => { | ||
self.next_token(); | ||
false.into() | ||
} | ||
_ => return self.expected(")", self.peek_token()), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match self.peek_token().token { | |
Token::RParen => { | |
self.next_token(); | |
false.into() | |
} | |
_ => return self.expected(")", self.peek_token()), | |
} | |
self.expect_token(&Token::RParen)?; | |
false.into() |
src/parser/mod.rs
Outdated
@@ -7229,10 +7274,18 @@ impl<'a> Parser<'a> { | |||
)))) | |||
} | |||
} | |||
Keyword::STRUCT if dialect_of!(self is DuckDbDialect) => { | |||
self.prev_token(); | |||
// let field_defs= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// let field_defs= |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great.
Thanks for your review @git-hulk |
src/parser/mod.rs
Outdated
@@ -2227,7 +2228,15 @@ impl<'a> Parser<'a> { | |||
|
|||
Ok(( | |||
field_defs, | |||
self.expect_closing_angle_bracket(trailing_bracket)?, | |||
if token == Token::Lt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a slight mismatch I'm thinking might not be worth having - the open delimeter token is configurable now but here the function still needs to figure out for itself what the value is for the closing token.
The function is a bit complicated on its own, specifically due to the <
delimeter. then since duckdb syntax doesn't have that I'm thinking it makes sense to use a dedicated function that is simpler.
Something roughly like this I think should be all duckdb need, would it make sense to add this instead and have duckdb (and any future dialect sharing the same syntax) use it?
fn parse_duck_db_struct_type_def<F>(
&mut self,
) -> Result<Vec<StructField>, ParserError>
{
let start_token = self.peek_token();
self.expect_keyword(Keyword::STRUCT)?;
self.expect_token(Token::LParen)?;
let field_defs = self.parse_comma_separated(Self::parse_struct_field_def)?;
self.expect_token(Token::RParen)?;
Ok(field_defs)
}
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
tests/sqlparser_bigquery.rs
Outdated
#[test] | ||
fn test_struct() { | ||
// nested struct | ||
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#; | ||
let sql = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#; | ||
let select = bigquery().parse_sql_statements(sql).unwrap().pop().unwrap(); | ||
// TODO: '>>' is incorrect parsed in bigquery syntax | ||
assert_ne!(select.to_string(), canonical); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow if/why this test is needed, since bigquery doesn't support []
syntax for arrays?
src/parser/mod.rs
Outdated
let field_defs = self | ||
.parse_comma_separated(Self::parse_struct_field_def)? | ||
.into_iter() | ||
.map(|(f, _)| f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this looks like a bug for us to silently drop the returned flag since there might be an extraneous >
, seems like we would instead do something like this
let field_defs = self.parse_comma_separated(|self| {
Ok(StructField {
field_name: Some(self.parse_identifier(false)?), // I assume field name isn't optional for duckdb?
field_type: self.parse_data_type(),
})
tests/sqlparser_duckdb.rs
Outdated
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>)"#; | ||
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER))"#; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think this is incorrect, no need for the input and output to not match, in this case the output is significantly different syntax from what went into the parser. We can probably add a parens_delimeter: bool
or similar flag to the StructField struct so that it knows how to display accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me. Do you mean adding flag so we have different canonical name for struct<>
and struct()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that's what I meant - my bad realised I wrote comma_delimeter
but meant to write parens_delimeter
instead. Yeah so like if that flag is true display uses struct (...)
otherwise struct<...>
I am hoping to make a sqlparser release soon (like tomorrow) -- do we think this PR is close to ready (aka should I hold the release for this PR?) |
Signed-off-by: jayzhan211 <[email protected]>
It is close. We can release together with this PR |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor cleanup comment, otherwise LGTM! cc @alamb
Co-authored-by: Ifeanyi Ubah <[email protected]>
I am working on updating the tests and adding some small comments. I'll be done shortly |
@@ -32,6 +32,118 @@ fn duckdb_and_generic() -> TestedDialects { | |||
} | |||
} | |||
|
|||
#[test] | |||
fn test_struct() { | |||
// s STRUCT(v VARCHAR, i INTEGER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the tests to also have coverage of the actual parsed data types
@@ -618,6 +625,17 @@ fn format_clickhouse_datetime_precision_and_timezone( | |||
Ok(()) | |||
} | |||
|
|||
/// Type of brackets used for `STRUCT` literals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added some comments on this struct
Thanks again @jayzhan211 and @iffyio and @git-hulk |
Close #1245