You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We add our own metadata to the parquet file. Currently, we do so using the WriterProperties' kv_metadata and the ArrowWriter. We want to start performing parquet writes with datafusion's ParquetSink, however a recent change has removed this ability to add our own metadata.
There was a change to unify the different writer options across sink types, specifically to make COPY TO and create external table have a uniform configuration. Users can now specify the configuration at the SQL level API (e.g. COPY <src> TO <sink> (<config_options>)). This was a good high level change; however, a side effect of the implementation was the removal of the ability to add our own metadata.
Describe the bug
We add our own metadata to the parquet file. Currently, we do so using the WriterProperties' kv_metadata and the ArrowWriter. We want to start performing parquet writes with datafusion's ParquetSink, however a recent change has removed this ability to add our own metadata.
There was a change to unify the different writer options across sink types, specifically to make
COPY TO
andcreate external table
have a uniform configuration. Users can now specify the configuration at the SQL level API (e.g.COPY <src> TO <sink> (<config_options>)
). This was a good high level change; however, a side effect of the implementation was the removal of the ability to add our own metadata.The current implementation (after the above change) now derives the writer properties from the TableParquetOptions. This conversion always sets the sorting_columns and user-defined kv_metadata as None, as demonstrated in the first commit of the fix.
To Reproduce
The hardcoded setting of the user metadata to None is demonstrated in this commit.
Expected behavior
The expected behavior is to be able to set our own metadata. Ideally, to have user-inserted metadata as an option at the SQL level API.
The expected outcome is demonstrated in this commit.
Additional context
No response
The text was updated successfully, but these errors were encountered: