feat: improve string statistics display in datafusion-cli parquet_metadata
function
#8535
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #8464
Rationale for this change
What changes are included in this PR?
Output for the
data_index_bloom_encoding_stats.parquet
fileDatafusion
DuckDB
Are these changes tested?
Yes
Are there any user-facing changes?
Note
One thing I did notice while testing this was that, for
parquet-testing/data/hadoop_lz4_compressed.parquet
file, the output was still a byte array.I checked the converted type was
None
for that column so, not sure if just blindly converting byte array into utf-8 string would be the right approach. Open to suggestions.