You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@suremarc got this error when writing to a partitioned table:
This feature is not implemented: it is not yet supported to write to hive partitions with datatype Dictionary(UInt16, Utf8)
Here is a repro using datafusion-cli:
CREATE EXTERNAL TABLE lz4_raw_compressed_larger
STORED AS PARQUET
PARTITIONED BY (partition)
LOCATION 'data/';
INSERT INTO lz4_raw_compressed_larger VALUES ('non-partition-value', 'partition');
Here's a zip file with a single file in it, data/partition=A/lz4_raw_compressed_larger.parquet.
I noticed the unit tests specify the schema explicitly, but I am guessing if you have DataFusion infer the schema, the partition columns are encoded as dictionaries. I think this will limit the usefulness of this feature if partitioned writes don't work with tables whose schemas are inferred.
Hm, I am a little confused why Datafusion is inferring the schema of UTF8 data as Dictionary(some int type, UTF8).
🤔 will have to look into it. It does seem that #7891, #7892, and some of the inconveniences reported by @theelderbeever in #7860 are all related.
Perhaps the partitioning code could accept any arrow array type which can be explicitly cast to UTF8, rather than only strictly UTF8... I assume since these Dictionary columns are representing string data, they can be cast to a plain UTF8 array without panic/error.
@suremarc got this error when writing to a partitioned table:
Here is a repro using
datafusion-cli
:Here's a zip file with a single file in it,
data/partition=A/lz4_raw_compressed_larger.parquet
.I noticed the unit tests specify the schema explicitly, but I am guessing if you have DataFusion infer the schema, the partition columns are encoded as dictionaries. I think this will limit the usefulness of this feature if partitioned writes don't work with tables whose schemas are inferred.
Originally posted by @suremarc in #7801 (comment)
The text was updated successfully, but these errors were encountered: