Skip to content

Commit

Permalink
feat/copy-to-parquet-parameter:
Browse files Browse the repository at this point in the history
 Disable dictionary encoding for timestamp columns in Parquet writer and update default max_active_window_runs in TwcsOptions

 - Modified Parquet writer to disable dictionary encoding for timestamp columns to optimize for increasing timestamp data.
  • Loading branch information
v0y4g3r committed Jul 10, 2024
1 parent 4ac87c2 commit 5e51546
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 9 deletions.
10 changes: 2 additions & 8 deletions src/common/datasource/src/file_format/parquet.rs
Original file line number Diff line number Diff line change
Expand Up @@ -225,14 +225,8 @@ fn column_wise_config(
mut props: WriterPropertiesBuilder,
schema: SchemaRef,
) -> WriterPropertiesBuilder {
// Disable dictionary for timestamp column.
if let Some(ts_col) = schema.timestamp_column() {
let path = ColumnPath::new(vec![ts_col.name.clone()]);
props = props
.set_column_dictionary_enabled(path.clone(), false)
.set_column_encoding(path, Encoding::DELTA_BINARY_PACKED)
}

// Disable dictionary for timestamp column, since for increasing timestamp column,
// the dictionary pages will be larger than data pages.
for col in schema.column_schemas() {
if col.data_type.is_timestamp() {
let path = ColumnPath::new(vec![col.name.clone()]);
Expand Down
2 changes: 1 addition & 1 deletion src/mito2/src/region/options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ impl TwcsOptions {
impl Default for TwcsOptions {
fn default() -> Self {
Self {
max_active_window_runs: 1,
max_active_window_runs: 4,
max_inactive_window_runs: 1,
time_window: None,
remote_compaction: false,
Expand Down

0 comments on commit 5e51546

Please sign in to comment.