You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
The partition_by COPY option is multivalued, e.g.:
COPY table to file.parquet (partition_by 'a,b,c')
This is handled currently by passing a comma separated string literal to the COPY statement which is parsed later during planning by splitting on the comma. The current parsing is not as robust at handling edge cases (e.g. it won't handle a column name which itself contains a comma).
We could support this same syntax with parser updates.
Describe the solution you'd like
Add support for multivalued COPY options in DFParser. E.g.
#[derive(Debug,Clone,PartialEq,Eq)]pubstructCopyToStatement{/// From where the data comes frompubsource:CopyToSource,/// The URL to where the data is headingpubtarget:String,/// Target specific optionspuboptions:Vec<(String,CopyToOptionValue)>,}#[derive(Debug,Clone,PartialEq,Eq)]pubenumCopyToOptionValue{/// A single [Value], e.g. (format parquet)Single(Value),/// A list of [Value]s, e.g. (partition_by ("a", "b", "c"))List(Vec<String>),}pubfnparse_option_value(&mutself) -> Result<CopyToOptionValue,ParserError>{let next_token = self.parser.peek_token();match next_token.token{Token::Word(Word{ value, .. }) => {self.parser.next_token();Ok(CopyToOptionValue::Single(Value::UnQuotedString(value)))},Token::SingleQuotedString(s) => {self.parser.next_token();Ok(CopyToOptionValue::Single(Value::SingleQuotedString(s)))},Token::DoubleQuotedString(s) => {self.parser.next_token();Ok(CopyToOptionValue::Single(Value::DoubleQuotedString(s)))},Token::EscapedStringLiteral(s) => {self.parser.next_token();Ok(CopyToOptionValue::Single(Value::EscapedStringLiteral(s)))},Token::Number(ref n, l) => {self.parser.next_token();match n.parse(){Ok(n) => Ok(CopyToOptionValue::Single(Value::Number(n, l))),// The tokenizer should have ensured `n` is an integer// so this should not be possibleErr(e) => parser_err!(format!("Unexpected error: could not parse '{n}' as number: {e}")),}},Token::LParen => {Ok(CopyToOptionValue::List(self.parse_partitions()?))},
_ => self.parser.expected("string or numeric value", next_token),}}
The CopyTo logical plan will also need to be updated to accept multi valued options. This will require a good amount of work to rewire the code to handle the possibility of multi valued options.
Describe alternatives you've considered
Keep the parser and logical plan as-is. Partitioning by columns containing commas in their name may be a rare enough special case that we can simply not support it.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
The partition_by COPY option is multivalued, e.g.:
This is handled currently by passing a comma separated string literal to the COPY statement which is parsed later during planning by splitting on the comma. The current parsing is not as robust at handling edge cases (e.g. it won't handle a column name which itself contains a comma).
Other systems (e.g. DuckDB), have a special syntax for partition_by option https://duckdb.org/docs/data/partitioning/partitioned_writes.html:
We could support this same syntax with parser updates.
Describe the solution you'd like
Add support for multivalued COPY options in DFParser. E.g.
The CopyTo logical plan will also need to be updated to accept multi valued options. This will require a good amount of work to rewire the code to handle the possibility of multi valued options.
Describe alternatives you've considered
Keep the parser and logical plan as-is. Partitioning by columns containing commas in their name may be a rare enough special case that we can simply not support it.
Additional context
No response
The text was updated successfully, but these errors were encountered: