Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support unprefixed config format options #9594

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions datafusion/sql/src/statement.rs
Original file line number Diff line number Diff line change
Expand Up @@ -855,6 +855,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
let file_type = try_infer_file_type(&mut options, &statement.target)?;
let partition_by = take_partition_by(&mut options);

match &file_type {
// Renames un-prefixed keys to support legacy format specific options
FileType::CSV | FileType::JSON | FileType::PARQUET => {
let prefix = format!("{}", file_type);
let keys_to_rename: Vec<_> = options
.keys()
.filter(|key| !key.starts_with(&prefix))
.cloned()
.collect();

for key in keys_to_rename {
if let Some(value) = options.remove(&key) {
let new_key = format!("{}.{}", prefix, key);
options.insert(new_key, value);
}
}
}
_ => {}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match &file_type {
// Renames un-prefixed keys to support legacy format specific options
FileType::CSV | FileType::JSON | FileType::PARQUET => {
let prefix = format!("{}", file_type);
let keys_to_rename: Vec<_> = options
.keys()
.filter(|key| !key.starts_with(&prefix))
.cloned()
.collect();
for key in keys_to_rename {
if let Some(value) = options.remove(&key) {
let new_key = format!("{}.{}", prefix, key);
options.insert(new_key, value);
}
}
}
_ => {}
}
let options = match &file_type {
// Renames un-prefixed keys to support legacy format specific options
FileType::CSV | FileType::JSON | FileType::PARQUET => {
options
.into_iter()
.map(|(k, v)| {
// If config does not belong to any namespace, assume it is
// a legacy option and apply file_type namespace for backwards
// compatibility.
if !k.contains('.') {
let new_key = format!("{}.{}", file_type, k);
(new_key, v)
} else {
(k, v)
}
})
.collect()
}
_ => options,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall strategy here looks good, but I propose two changes:

  1. We can make the logic more concise by modifying options in a single pass with into_iter, map, collect. This also avoids clones.
  2. I think we should be a bit more conservative about the keys we modify. If the key has a namespace at all (rather than specifically the file format namespace), we can leave it as is. Downstream users may for example add a custom namespace and we wouldn't wan't this code to modify it unexpectedly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented both changes.

Converted the PR to draft for the while being due to #9594 (comment).

Ok(LogicalPlan::Copy(CopyTo {
input: Arc::new(input),
output_url: statement.target,
Expand Down
32 changes: 32 additions & 0 deletions datafusion/sqllogictest/test_files/copy.slt
Original file line number Diff line number Diff line change
Expand Up @@ -484,3 +484,35 @@ COPY (select col2, sum(col1) from source_table
# Copy from table with non literal
query error DataFusion error: SQL error: ParserError\("Expected ',' or '\)' after option definition, found: \+"\)
COPY source_table to '/tmp/table.parquet' (row_group_size 55 + 102);


# Legacy Format Options Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


# Copy with legacy format options for Parquet
query IT
COPY source_table TO 'test_files/scratch/copy/format_table/' (
format parquet,
compression snappy,
'compression::col1' 'zstd(5)'
);
----
2

# Copy with legacy format options for JSON
query IT
COPY source_table to 'test_files/scratch/copy/format_table' (format json, compression gzip);
----
2

# Copy with legacy format options for CSV
query IT
COPY source_table to 'test_files/scratch/copy/format_table' (
format csv,
has_header false,
compression xz,
datetime_format '%FT%H:%M:%S.%9f',
delimiter ';',
null_value 'NULLVAL'
);
----
2
Loading