Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DuckDB struct syntax and support list of struct syntax #1372

Merged
merged 14 commits into from
Aug 15, 2024
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ Cargo.lock
.vscode

*.swp

.DS_store
24 changes: 20 additions & 4 deletions src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2136,7 +2136,7 @@ impl<'a> Parser<'a> {
/// ```
fn parse_bigquery_struct_literal(&mut self) -> Result<Expr, ParserError> {
let (fields, trailing_bracket) =
self.parse_struct_type_def(Self::parse_struct_field_def)?;
self.parse_struct_type_def(Self::parse_struct_field_def, Token::Lt)?;
if trailing_bracket.0 {
return parser_err!("unmatched > in STRUCT literal", self.peek_token().location);
}
Expand Down Expand Up @@ -2196,6 +2196,7 @@ impl<'a> Parser<'a> {
fn parse_struct_type_def<F>(
&mut self,
mut elem_parser: F,
token: Token,
) -> Result<(Vec<StructField>, MatchedTrailingBracket), ParserError>
where
F: FnMut(&mut Parser<'a>) -> Result<(StructField, MatchedTrailingBracket), ParserError>,
Expand All @@ -2204,7 +2205,7 @@ impl<'a> Parser<'a> {
self.expect_keyword(Keyword::STRUCT)?;

// Nothing to do if we have no type information.
if Token::Lt != self.peek_token() {
if token != self.peek_token() {
return Ok((Default::default(), false.into()));
}
self.next_token();
Expand All @@ -2227,7 +2228,15 @@ impl<'a> Parser<'a> {

Ok((
field_defs,
self.expect_closing_angle_bracket(trailing_bracket)?,
if token == Token::Lt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a slight mismatch I'm thinking might not be worth having - the open delimeter token is configurable now but here the function still needs to figure out for itself what the value is for the closing token.
The function is a bit complicated on its own, specifically due to the < delimeter. then since duckdb syntax doesn't have that I'm thinking it makes sense to use a dedicated function that is simpler.

Something roughly like this I think should be all duckdb need, would it make sense to add this instead and have duckdb (and any future dialect sharing the same syntax) use it?

   fn parse_duck_db_struct_type_def<F>(
        &mut self,
    ) -> Result<Vec<StructField>, ParserError>
    {
        let start_token = self.peek_token();
        self.expect_keyword(Keyword::STRUCT)?;

        self.expect_token(Token::LParen)?;

        let field_defs = self.parse_comma_separated(Self::parse_struct_field_def)?;

        self.expect_token(Token::RParen)?;

        Ok(field_defs)
    }

This comment was marked as outdated.

self.expect_closing_angle_bracket(trailing_bracket)?
} else {
if !trailing_bracket.0 {
self.expect_token(&Token::RParen)?;
};

false.into()
},
))
}

Expand Down Expand Up @@ -7229,10 +7238,17 @@ impl<'a> Parser<'a> {
))))
}
}
Keyword::STRUCT if dialect_of!(self is DuckDbDialect) => {
self.prev_token();
let (field_defs, _trailing_bracket) =
self.parse_struct_type_def(Self::parse_struct_field_def, Token::LParen)?;

Ok(DataType::Struct(field_defs))
}
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect) => {
self.prev_token();
let (field_defs, _trailing_bracket) =
self.parse_struct_type_def(Self::parse_struct_field_def)?;
self.parse_struct_type_def(Self::parse_struct_field_def, Token::Lt)?;
trailing_bracket = _trailing_bracket;
Ok(DataType::Struct(field_defs))
}
Expand Down
10 changes: 10 additions & 0 deletions tests/sqlparser_bigquery.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ use sqlparser::dialect::{BigQueryDialect, GenericDialect};
use sqlparser::parser::{ParserError, ParserOptions};
use test_utils::*;

#[test]
fn test_struct() {
// nested struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let select = bigquery().parse_sql_statements(sql).unwrap().pop().unwrap();
// TODO: '>>' is incorrect parsed in bigquery syntax
assert_ne!(select.to_string(), canonical);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow if/why this test is needed, since bigquery doesn't support [] syntax for arrays?


#[test]
fn parse_literal_string() {
let sql = concat!(
Expand Down
33 changes: 33 additions & 0 deletions tests/sqlparser_duckdb.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,39 @@ fn duckdb_and_generic() -> TestedDialects {
}
}

#[test]
fn test_struct() {
// basic struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>)"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER))"#;
Copy link
Contributor

@iffyio iffyio Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think this is incorrect, no need for the input and output to not match, in this case the output is significantly different syntax from what went into the parser. We can probably add a parens_delimeter: bool or similar flag to the StructField struct so that it knows how to display accordingly?

Copy link
Contributor Author

@jayzhan211 jayzhan211 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me. Do you mean adding flag so we have different canonical name for struct<> and struct()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's what I meant - my bad realised I wrote comma_delimeter but meant to write parens_delimeter instead. Yeah so like if that flag is true display uses struct (...) otherwise struct<...>

let select = duckdb().parse_sql_statements(sql).unwrap().pop().unwrap();
assert_eq!(select.to_string(), canonical);

// struct array
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER)[])"#;
let select = duckdb().parse_sql_statements(sql).unwrap().pop().unwrap();
assert_eq!(select.to_string(), canonical);

// nested struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, s STRUCT(a1 INTEGER, a2 VARCHAR))[])"#;
let select = duckdb().parse_sql_statements(sql).unwrap().pop().unwrap();
assert_eq!(select.to_string(), canonical);

// failing test
let sql_list = vec![
r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER)))"#,
jayzhan211 marked this conversation as resolved.
Show resolved Hide resolved
r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER>)"#,
r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>)"#,
r#"CREATE TABLE t1 (s STRUCT v VARCHAR, i INTEGER )"#,
];

for sql in sql_list {
duckdb().parse_sql_statements(sql).unwrap_err();
}
}
jayzhan211 marked this conversation as resolved.
Show resolved Hide resolved

#[test]
fn test_select_wildcard_with_exclude() {
let select = duckdb().verified_only_select("SELECT * EXCLUDE (col_a) FROM data");
Expand Down
Loading