Skip to content

Commit

Permalink
improve config comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
Rachelint committed Sep 1, 2024
1 parent 0a7b52b commit 318c650
Show file tree
Hide file tree
Showing 3 changed files with 96 additions and 96 deletions.
14 changes: 7 additions & 7 deletions datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -344,14 +344,14 @@ config_namespace! {

/// Should DataFusion use the the blocked approach to manage the groups
/// values and their related states in accumulators. By default, the single
/// approach will be used, and such group values and states will be managed
/// using a single big block(can think a `Vec`), obviously as the block growing up,
/// many copies will be triggered and finally get a bad performance.
/// approach will be used, values are managed within a single large block
/// (can think of it as a Vec). As this block grows, it often triggers
/// numerous copies, resulting in poor performance.
/// If setting this flag to `true`, the blocked approach will be used.
/// We will allocate the `block size` capacity for block first, and when we
/// found the block has been filled to `block size` limit, we will allocate
/// next block rather than growing current block and copying the data. This
/// approach can eliminate all unnecessary copies and get a good performance finally.
/// And the blocked approach allocates capacity for the block
/// based on a predefined block size firstly. When the block reaches its limit,
/// we allocate a new block (also with the same predefined block size based capacity)
// instead of expanding the current one and copying the data.
/// We plan to make this the default in the future when tests are enough.
pub enable_aggregation_group_states_blocked_approach: bool, default = false
}
Expand Down
2 changes: 1 addition & 1 deletion datafusion/sqllogictest/test_files/information_schema.slt
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ datafusion.execution.aggregate.scalar_update_factor 10 Specifies the threshold f
datafusion.execution.batch_size 8192 Default batch size while creating new batches, it's especially useful for buffer-in-memory batches since creating tiny batches would result in too much metadata memory consumption
datafusion.execution.coalesce_batches true When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting
datafusion.execution.collect_statistics false Should DataFusion collect statistics after listing files
datafusion.execution.enable_aggregation_group_states_blocked_approach false Should DataFusion use the the blocked approach to manage the groups values and their related states in accumulators. By default, the single approach will be used, and such group values and states will be managed using a single big block(can think a `Vec`), obviously as the block growing up, many copies will be triggered and finally get a bad performance. If setting this flag to `true`, the blocked approach will be used. We will allocate the `block size` capacity for block first, and when we found the block has been filled to `block size` limit, we will allocate next block rather than growing current block and copying the data. This approach can eliminate all unnecessary copies and get a good performance finally. We plan to make this the default in the future when tests are enough.
datafusion.execution.enable_aggregation_group_states_blocked_approach false Should DataFusion use the the blocked approach to manage the groups values and their related states in accumulators. By default, the single approach will be used, values are managed within a single large block (can think of it as a Vec). As this block grows, it often triggers numerous copies, resulting in poor performance. If setting this flag to `true`, the blocked approach will be used. And the blocked approach allocates capacity for the block based on a predefined block size firstly. When the block reaches its limit, we allocate a new block (also with the same predefined block size based capacity) We plan to make this the default in the future when tests are enough.
datafusion.execution.enable_recursive_ctes true Should DataFusion support recursive CTEs
datafusion.execution.keep_partition_by_columns false Should DataFusion keep the columns used for partition_by in the output RecordBatches
datafusion.execution.listing_table_ignore_subdirectory true Should sub directories be ignored when scanning directories for data files. Defaults to true (ignores subdirectories), consistent with Hive. Note that this setting does not affect reading partitioned tables (e.g. `/table/year=2021/month=01/data.parquet`).
Expand Down
Loading

0 comments on commit 318c650

Please sign in to comment.