Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter pipeline support for datatype conversions based on filtered output datatype. #4165

Merged
merged 22 commits into from
Jul 28, 2023

Conversation

shaunrd0
Copy link
Contributor

@shaunrd0 shaunrd0 commented Jul 12, 2023

Passes filter output datatype to next filter(s) in pipeline, adding support for type conversions within a pipeline based on filter output datatype. Also refactors pipeline validation, adding Filter::accepts_input_datatype and Filter::output_datatype from #4057. Deriving filters can override these methods if they are restricted to certain input datatypes and / or convert tile datatype when the filter is applied.


TYPE: IMPROVEMENT
DESC: Filter pipeline support for datatype conversions based on filtered output datatype.

@shortcut-integration
Copy link

@shaunrd0 shaunrd0 requested a review from ihnorton July 12, 2023 13:45
@ihnorton ihnorton requested review from davisp and KiterLuc July 12, 2023 13:52
Copy link
Contributor

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving these comments as a first pass review. I'll re-review deeper when I'm not sitting in my truck outside the doctor's office.

tiledb/sm/filter/filter_pipeline.cc Outdated Show resolved Hide resolved
tiledb/sm/filter/filter_pipeline.cc Outdated Show resolved Hide resolved
tiledb/sm/filter/webp_filter.cc Show resolved Hide resolved
tiledb/sm/filter/xor_filter.cc Show resolved Hide resolved
tiledb/sm/tile/tile.h Outdated Show resolved Hide resolved
tiledb/sm/filter/filter.h Show resolved Hide resolved
@shaunrd0 shaunrd0 force-pushed the smr/sc-24079/filter-output-type branch from 9ca7ca7 to 069855c Compare July 13, 2023 13:35
tiledb/sm/filter/filter.h Outdated Show resolved Hide resolved
tiledb/sm/filter/filter.cc Outdated Show resolved Hide resolved
tiledb/sm/filter/xor_filter.h Outdated Show resolved Hide resolved
@@ -221,6 +259,11 @@ Status FilterPipeline::filter_chunks_forward(
&output_metadata,
&output_data));

// Final tile type will be the output type of last filter in pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do this in the writer when constructing the tile? I'd like to avoid changing the type after the tile was created.

if (last_filter) {
void* output_chunk_buffer =
static_cast<char*>(tile->data()) + chunk_data.chunk_offsets_[i];
RETURN_NOT_OK(output_data.set_fixed_allocation(
output_chunk_buffer, chunk.unfiltered_data_size_));
reader_stats->add_counter(
"read_unfiltered_byte_num", chunk.unfiltered_data_size_);
// Restore tile datatype to it's initial schema value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tile created by the reader should have the right type from the get go. My understanding is that the filter pipeline will use temporary buffers until we reach the last filter, at which point we'll copy to the final tile. Am I incorrect? I don't think we should change the tile type after it's been created.

test/src/unit-filter-pipeline.cc Show resolved Hide resolved
tiledb/sm/filter/float_scaling_filter.h Outdated Show resolved Hide resolved
davisp
davisp previously requested changes Jul 14, 2023
Copy link
Contributor

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor changes to revert.

tiledb/sm/filter/filter_pipeline.cc Outdated Show resolved Hide resolved
tiledb/sm/filter/filter_pipeline.h Outdated Show resolved Hide resolved
tiledb/sm/filter/bit_width_reduction_filter.h Outdated Show resolved Hide resolved
tiledb/sm/filter/bit_width_reduction_filter.h Outdated Show resolved Hide resolved
tiledb/sm/filter/compression_filter.h Show resolved Hide resolved
tiledb/sm/filter/compression_filter.h Show resolved Hide resolved
tiledb/sm/filter/filter.h Outdated Show resolved Hide resolved
tiledb/sm/filter/float_scaling_filter.h Show resolved Hide resolved
tiledb/sm/filter/float_scaling_filter.h Outdated Show resolved Hide resolved
tiledb/sm/filter/positive_delta_filter.h Outdated Show resolved Hide resolved
tiledb/sm/filter/webp_filter.h Show resolved Hide resolved
tiledb/sm/filter/xor_filter.h Show resolved Hide resolved
@shaunrd0 shaunrd0 force-pushed the smr/sc-24079/filter-output-type branch from eaaa2a6 to c36371d Compare July 20, 2023 13:52
This reverts commit c5ba20c.

+ Clang format after merge
@ihnorton ihnorton merged commit cb4b69d into dev Jul 28, 2023
@ihnorton ihnorton deleted the smr/sc-24079/filter-output-type branch July 28, 2023 17:40
@ihnorton ihnorton changed the title Typed filter pipeline Filter pipeline support for datatype conversions based on filtered output datatype. Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants