Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error writing to a partitioned table: : it is not yet supported to write to hive partitions with datatype Dictionary(UInt16, Utf8) #7891

Closed
alamb opened this issue Oct 20, 2023 · 2 comments · Fixed by #7896

Comments

@alamb
Copy link
Contributor

alamb commented Oct 20, 2023

@suremarc got this error when writing to a partitioned table:

This feature is not implemented: it is not yet supported to write to hive partitions with datatype Dictionary(UInt16, Utf8)

Here is a repro using datafusion-cli:

CREATE EXTERNAL TABLE lz4_raw_compressed_larger
STORED AS PARQUET
PARTITIONED BY (partition)
LOCATION 'data/';

INSERT INTO lz4_raw_compressed_larger VALUES ('non-partition-value', 'partition');

Here's a zip file with a single file in it, data/partition=A/lz4_raw_compressed_larger.parquet.

I noticed the unit tests specify the schema explicitly, but I am guessing if you have DataFusion infer the schema, the partition columns are encoded as dictionaries. I think this will limit the usefulness of this feature if partitioned writes don't work with tables whose schemas are inferred.

Originally posted by @suremarc in #7801 (comment)

@devinjdangelo
Copy link
Contributor

Hm, I am a little confused why Datafusion is inferring the schema of UTF8 data as Dictionary(some int type, UTF8).

🤔 will have to look into it. It does seem that #7891, #7892, and some of the inconveniences reported by @theelderbeever in #7860 are all related.

Perhaps the partitioning code could accept any arrow array type which can be explicitly cast to UTF8, rather than only strictly UTF8... I assume since these Dictionary columns are representing string data, they can be cast to a plain UTF8 array without panic/error.

@devinjdangelo
Copy link
Contributor

Ok, I read the arrow-rs docs on dictionary array types, so I understand what that means now... I took a stab at solving this in #7896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants