Skip to content

Commit

Permalink
PARQUET-2352: Allow truncation of row group min_values/max_value stat…
Browse files Browse the repository at this point in the history
…istics (#216)

This updates the spec to allow truncation of row group min_values/max_value statistics
so that readers can take advantage of row group pruning for predicates on columns
containing long strings.
https://issues.apache.org/jira/browse/PARQUET-1685 already introduced a feature to parquet-mr
which allows users to deviate from the current spec and configure truncation of row group statistics.
This change also adds is_max_value_exact/is_min_value_exact to allow writers to specify
when the max_value/min_value are the actual max and min values found on the column chunk.
  • Loading branch information
raunaqmorarka authored Oct 18, 2023
1 parent 77949ba commit 31f92c7
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -216,13 +216,23 @@ struct Statistics {
/** count of distinct values occurring */
4: optional i64 distinct_count;
/**
* Min and max values for the column, determined by its ColumnOrder.
* Lower and upper bound values for the column, determined by its ColumnOrder.
*
* These may be the actual minimum and maximum values found on a page or column
* chunk, but can also be (more compact) values that do not exist on a page or
* column chunk. For example, instead of storing "Blart Versenwald III", a writer
* may set min_value="B", max_value="C". Such more compact values must still be
* valid values within the column's logical type.
*
* Values are encoded using PLAIN encoding, except that variable-length byte
* arrays do not include a length prefix.
*/
5: optional binary max_value;
6: optional binary min_value;
/** If true, max_value is the actual maximum value for a column */
7: optional bool is_max_value_exact;
/** If true, min_value is the actual minimum value for a column */
8: optional bool is_min_value_exact;
}

/** Empty structs to use as logical type annotations */
Expand Down

0 comments on commit 31f92c7

Please sign in to comment.