-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ParquetExec
and related documentation
#10647
Changes from 1 commit
70b65bd
1d60645
91a27e8
0290f38
2f949e8
fc7b497
d6b6d10
bd1e987
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -75,7 +75,79 @@ pub use metrics::ParquetFileMetrics; | |||||
pub use schema_adapter::{SchemaAdapter, SchemaAdapterFactory, SchemaMapper}; | ||||||
pub use statistics::{RequestedStatistics, StatisticsConverter}; | ||||||
|
||||||
/// Execution plan for scanning one or more Parquet partitions | ||||||
/// Execution plan for reading one or more Parquet files. | ||||||
/// | ||||||
/// ```text | ||||||
/// ▲ | ||||||
/// │ | ||||||
/// │ Produce a stream of | ||||||
/// │ RecordBatches | ||||||
/// │ | ||||||
/// ┌───────────────────────┐ | ||||||
/// │ │ | ||||||
/// │ ParquetExec │ | ||||||
/// │ │ | ||||||
/// └───────────────────────┘ | ||||||
/// ▲ | ||||||
/// │ Asynchronously read from one | ||||||
/// │ or more parquet files via | ||||||
/// │ ObjectStore interface | ||||||
/// │ | ||||||
/// │ | ||||||
/// .───────────────────. | ||||||
/// │ ) | ||||||
/// │`───────────────────'│ | ||||||
/// │ ObjectStore │ | ||||||
/// │.───────────────────.│ | ||||||
/// │ ) | ||||||
/// `───────────────────' | ||||||
/// | ||||||
/// ``` | ||||||
/// # Features | ||||||
/// | ||||||
/// Supports the following optimizations: | ||||||
/// | ||||||
/// * Multi-threaded (aka multi-partition): read from one or more files in | ||||||
/// parallel. Can read concurrently from multiple row groups from a single file. | ||||||
/// | ||||||
/// * Predicate push down: skips row groups and pages based on | ||||||
/// min/max/null_counts in the row group metadata, the page index and bloom | ||||||
/// filters. | ||||||
/// | ||||||
/// * Projection pushdown: reads and decodes only the columns required. | ||||||
/// | ||||||
/// * Limit pushdown: stop execution early after some number of rows are read. | ||||||
/// | ||||||
/// * Custom readers: controls I/O for accessing pages. See | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It's not steering the IO process, it's actually responsible for performing (or not performing) it. For example, a custom impl. could totally NOT use an object store (which is esp. interesting for the metadata bit, see other comment below). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good call -- updated |
||||||
/// [`ParquetFileReaderFactory`] for more details. | ||||||
/// | ||||||
/// * Schema adapters: read parquet files with different schemas into a unified | ||||||
/// table schema. This can be used to implement "schema evolution". See | ||||||
/// [`SchemaAdapterFactory`] for more details. | ||||||
/// | ||||||
/// * metadata_size_hint: controls the number of bytes read from the end of the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW this is passed on to the reader (custom or builtin) and the reader uses that to gather the metadata. The reader CAN however use another more precise source for this information or not read the metadata from object store at all (e.g. it could use an extra service, a dataset-based source or some sort of cache). |
||||||
/// file in the initial I/O. | ||||||
/// | ||||||
/// # Execution Overview | ||||||
/// | ||||||
/// * Step 1: [`ParquetExec::execute`] is called, returning a [`FileStream`] | ||||||
/// configured to open parquet files with a [`ParquetOpener`]. | ||||||
/// | ||||||
/// * Step 2: When the stream is polled, the [`ParquetOpener`] is called to open | ||||||
/// the file. | ||||||
/// | ||||||
/// * Step 3: The `ParquetOpener` gets the file metadata by reading the footer, | ||||||
/// and applies any predicates and projections to determine what pages must be | ||||||
/// read. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It gets the metadata from the |
||||||
/// | ||||||
/// * Step 4: The stream begins reading data, fetching the required pages | ||||||
/// and incrementally decoding them. | ||||||
/// | ||||||
/// * Step 5: As each [`RecordBatch]` is read, it may be adapted by a | ||||||
/// [`SchemaAdapter`] to match the table schema. By default missing columns are | ||||||
/// filled with nulls, but this can be customized via [`SchemaAdapterFactory`]. | ||||||
/// | ||||||
/// [`RecordBatch`]: arrow::record_batch::RecordBatch | ||||||
#[derive(Debug, Clone)] | ||||||
pub struct ParquetExec { | ||||||
/// Base configuration for this scan | ||||||
|
@@ -85,9 +157,9 @@ pub struct ParquetExec { | |||||
metrics: ExecutionPlanMetricsSet, | ||||||
/// Optional predicate for row filtering during parquet scan | ||||||
predicate: Option<Arc<dyn PhysicalExpr>>, | ||||||
/// Optional predicate for pruning row groups | ||||||
/// Optional predicate for pruning row groups (derived from `predicate`) | ||||||
pruning_predicate: Option<Arc<PruningPredicate>>, | ||||||
/// Optional predicate for pruning pages | ||||||
/// Optional predicate for pruning pages (derived from `predicate`) | ||||||
page_pruning_predicate: Option<Arc<PagePruningPredicate>>, | ||||||
/// Optional hint for the size of the parquet metadata | ||||||
metadata_size_hint: Option<usize>, | ||||||
|
@@ -642,11 +714,22 @@ fn should_enable_page_index( | |||||
.unwrap_or(false) | ||||||
} | ||||||
|
||||||
/// Factory of parquet file readers. | ||||||
/// Interface for creating [`AsyncFileReader`]s to read parquet files. | ||||||
/// | ||||||
/// This interface is used by [`ParquetOpener`] in order to create readers for | ||||||
/// parquet files. Implementations of this trait can be used to provide custom | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's "this trait" in this case? I guess you're referring to The combined implementations of [`ParquetFileReaderFactory`] and [`AsyncFileReader`]
can be used to provide custom data access operations such as
pre-cached data, I/O coalescing, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Excellent idea. I did so |
||||||
/// data access operations such as pre-cached data, I/O coalescing, etc. | ||||||
/// | ||||||
/// Provides means to implement custom data access interface. | ||||||
/// [`DefaultParquetFileReaderFactory`] by default returns a | ||||||
/// [`ParquetObjectReader`]. | ||||||
pub trait ParquetFileReaderFactory: Debug + Send + Sync + 'static { | ||||||
/// Provides `AsyncFileReader` over parquet file specified in `FileMeta` | ||||||
/// Provides an `AsyncFileReader` for reading data from a parquet file specified | ||||||
/// | ||||||
/// # Arguments | ||||||
/// * partition_index - Index of the partition (for reporting metrics) | ||||||
/// * file_meta - The file to be read | ||||||
/// * metadata_size_hint - If specified, the first IO reads this many bytes from the footer | ||||||
/// * metrics - Execution metrics | ||||||
fn create_reader( | ||||||
&self, | ||||||
partition_index: usize, | ||||||
|
@@ -663,13 +746,20 @@ pub struct DefaultParquetFileReaderFactory { | |||||
} | ||||||
|
||||||
impl DefaultParquetFileReaderFactory { | ||||||
/// Create a factory. | ||||||
/// Create a new `DefaultParquetFileReaderFactory`. | ||||||
pub fn new(store: Arc<dyn ObjectStore>) -> Self { | ||||||
Self { store } | ||||||
} | ||||||
} | ||||||
|
||||||
/// Implements [`AsyncFileReader`] for a parquet file in object storage | ||||||
/// Implements [`AsyncFileReader`] for a parquet file in object storage. | ||||||
/// | ||||||
/// This implementation uses the [`ParquetObjectReader`] to read data from the | ||||||
/// object store on demand, as required, tracking the number of bytes read. | ||||||
/// | ||||||
/// This implementation does not coalesce I/O operations or cache bytes. Such | ||||||
/// optimizations can be done either at the object store level or by providing a | ||||||
/// custom implementation of [`ParquetFileReaderFactory`]. | ||||||
pub(crate) struct ParquetFileReader { | ||||||
file_metrics: ParquetFileMetrics, | ||||||
inner: ParquetObjectReader, | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this "concurrency" instead of "multi-threading". IIRC we don't implement ANY threading in this operator and solely rely on tokio to dispatch concurrent bits for us. I think it's fine to mention that the concurrency in this operator CAN lead to multi-core usage under specific circumstances.