Skip to content

Commit

Permalink
GH-37312: [Python][Docs] Update Python docstrings to reflect new parq…
Browse files Browse the repository at this point in the history
…uet encoding option (#38070)

### Rationale for this change

Since parquet C++ has complete all encoding, we can publish this in Python doc.

### What changes are included in this PR?

Add encoding in document.

### Are these changes tested?

No

### Are there any user-facing changes?

No

* Closes: #37312

Lead-authored-by: mwish <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
3 people authored Oct 17, 2023
1 parent 44a00fc commit a5043e7
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions python/pyarrow/parquet/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -767,13 +767,16 @@ def _sanitize_table(table, new_schema, flavor):
Other features such as compression algorithms or the new serialized
data page format must be enabled separately (see 'compression' and
'data_page_version').
use_dictionary : bool or list
use_dictionary : bool or list, default True
Specify if we should use dictionary encoding in general or only for
some columns.
compression : str or dict
When encoding the column, if the dictionary size is too large, the
column will fallback to ``PLAIN`` encoding. Specially, ``BOOLEAN`` type
doesn't support dictionary encoding.
compression : str or dict, default 'snappy'
Specify the compression codec, either on a general basis or per-column.
Valid values: {'NONE', 'SNAPPY', 'GZIP', 'BROTLI', 'LZ4', 'ZSTD'}.
write_statistics : bool or list
write_statistics : bool or list, default True
Specify if we should write statistics in general (default is True) or only
for some columns.
use_deprecated_int96_timestamps : bool, default None
Expand Down Expand Up @@ -821,7 +824,10 @@ def _sanitize_table(table, new_schema, flavor):
and should be combined with a compression codec.
column_encoding : string or dict, default None
Specify the encoding scheme on a per column basis.
Currently supported values: {'PLAIN', 'BYTE_STREAM_SPLIT'}.
Can only be used when when ``use_dictionary`` is set to False, and
cannot be used in combination with ``use_byte_stream_split``.
Currently supported values: {'PLAIN', 'BYTE_STREAM_SPLIT',
'DELTA_BINARY_PACKED', 'DELTA_LENGTH_BYTE_ARRAY', 'DELTA_BYTE_ARRAY'}.
Certain encodings are only compatible with certain data types.
Please refer to the encodings section of `Reading and writing Parquet
files <https://arrow.apache.org/docs/cpp/parquet.html#encodings>`_.
Expand Down

0 comments on commit a5043e7

Please sign in to comment.