-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Partition pruning" for s3 #37356
"Partition pruning" for s3 #37356
Conversation
@@ -250,7 +250,10 @@ ColumnPtr IExecutableFunction::executeWithoutSparseColumns(const ColumnsWithType | |||
: columns_without_low_cardinality.front().column->size(); | |||
|
|||
auto res = executeWithoutLowCardinalityColumns(columns_without_low_cardinality, dictionary_type, new_input_rows_count, dry_run); | |||
auto keys = res->convertToFullColumnIfConst(); | |||
bool res_is_constant = isColumnConst(*res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When default implementation is used, Const(LowCardinality(...))
loses its constant property and becomes LowCardinality(...)
, which in turn disables pruning with virtual columns.
Here is the fix.
bool has_wildcards = s3_configuration.uri.bucket.find(PARTITION_ID_WILDCARD) != String::npos | ||
|| keys.back().find(PARTITION_ID_WILDCARD) != String::npos; | ||
if (partition_by && has_wildcards) | ||
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Reading from a partitioned S3 storage is not implemented yet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to read from a partitioned S3 storage for now (it always returns empty). Let's throw an exception instead.
select * from s3(s3_conn, filename='test_02302_*', format=Parquet) where _file like '%5'; | ||
|
||
-- Test s3 table with explicit keys (no glob) | ||
-- TODO support truncate table function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to construct an S3 storage with explicit keys, I have to apply the following clumsy steps. Things can be improved:
- allow to do
truncate table function s3(...)
- allow to create an S3 storage with a list of keys (or even different buckets)
afbb275
to
7931683
Compare
21a979c
to
1ee02a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Allow to prune the list of files via virtual columns such as
_file
and_path
when reading from S3. This is for #37174 , #23494