-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count. #23096
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count. #23096
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please add regression test
36fe9e7
to
674c974
Compare
ok |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
674c974
to
4b8bc5a
Compare
run buildall |
LGTM |
PR approved by at least one committer and no changes requested. |
clang-tidy review says "All clean, LGTM! 👍" |
(From new machine)TeamCity pipeline, clickbench performance test result: |
(From new machine)TeamCity pipeline, clickbench performance test result: |
…ect row count. (#23096) [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. `_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how many rows were not filtered which will to fill to the block. In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step. When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
…ect row count. (apache#23096) [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. `_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how many rows were not filtered which will to fill to the block. In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step. When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
Proposed changes
Fix Fix filling partition or missing column used incorrect row count.
_row_reader->nextBatch
returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look atnumElements
in the row batch to determine howmany rows were not filtered which will to fill to the block.
In this case, filling partition or missing column used incorrect row count which will cause be crash by
filter.size() != offsets.size()
in filter column step.When orc lazy materialization is turned off, add
_convert_dict_cols_to_string_cols(block, nullptr)
if(block->rows() == 0)
.Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...