From 15d44c7ffd614648018d2e98c7e6494fb077f5a1 Mon Sep 17 00:00:00 2001 From: Phillip LeBlanc Date: Tue, 22 Oct 2024 04:51:37 +0900 Subject: [PATCH] Improve docs for Parquet file format to indicate we support all compression/encodings (#553) --- spiceaidocs/docs/reference/file_format.md | 30 +++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/spiceaidocs/docs/reference/file_format.md b/spiceaidocs/docs/reference/file_format.md index 23e2c38f7..a2d50e488 100644 --- a/spiceaidocs/docs/reference/file_format.md +++ b/spiceaidocs/docs/reference/file_format.md @@ -6,9 +6,35 @@ pagination_prev: 'reference/index' pagination_next: null --- -Spice currently supports CSV and Parquet data file-formats. Support for Iceberg and other file-formats are on the roadmap. +Spice currently supports CSV and Parquet data file-formats for data connectors that can read files from a file system or cloud object storage (i.e. [`s3://`](../components/data-connectors/s3.md), [`abfs://`](../components/data-connectors/abfs.md), [`file://`](../components/data-connectors/file.md), etc.). Support for Iceberg and other file-formats are on the roadmap. -The parameters supported for specific file-format are detailed on this page. +The parameters supported for specific file-formats are detailed on this page. + +## Parquet + +Spice automatically supports reading any Parquet file, regardless of the compression codec or data encoding used. + +Compression codecs: + +- [`UNCOMPRESSED`](https://parquet.apache.org/docs/file-format/data-pages/compression/#uncompressed) +- [`SNAPPY`](https://parquet.apache.org/docs/file-format/data-pages/compression/#snappy) +- [`GZIP`](https://parquet.apache.org/docs/file-format/data-pages/compression/#gzip) +- [`LZO`](https://parquet.apache.org/docs/file-format/data-pages/compression/#lzo) +- [`BROTLI`](https://parquet.apache.org/docs/file-format/data-pages/compression/#brotli) +- [`LZ4`](https://parquet.apache.org/docs/file-format/data-pages/compression/#lz4) (deprecated in favor of `LZ4_RAW`) +- [`LZ4_RAW`](https://parquet.apache.org/docs/file-format/data-pages/compression/#lz4_raw) +- [`ZSTD`](https://parquet.apache.org/docs/file-format/data-pages/compression/#zstd) + +Data encodings: + +- [`PLAIN`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#plain-plain--0) +- [`PLAIN_DICTIONARY` / `RLE_DICTIONARY`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8) +- [`RLE`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#run-length-encoding--bit-packing-hybrid-rle--3) +- [`BIT_PACKED`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#bit-packed-deprecated-bit_packed--4) (deprecated in favor of `RLE`) +- [`DELTA_BINARY_PACKED`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#delta-binary-packing-delta_binary_packed--5) +- [`DELTA_LENGTH_BYTE_ARRAY`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#delta-length-byte-array-delta_length_byte_array--6) +- [`DELTA_BYTE_ARRAY`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#delta-strings-delta_byte_array--7) +- [`BYTE_STREAM_SPLIT`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#byte-stream-split-byte_stream_split--9) ## CSV