-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](hive) do not split compress data file and support lz4/snappy block codec #23245
[fix](hive) do not split compress data file and support lz4/snappy block codec #23245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by anyone and no changes requested. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
2 similar comments
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
3cd8a91
to
f838049
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
12efedc
to
a140a97
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
a140a97
to
e71ca52
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
e71ca52
to
d61ce86
Compare
run buildall |
d61ce86
to
d526979
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
(From new machine)TeamCity pipeline, clickbench performance test result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
…ock codec (apache#23245) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count(*)` query of csv file For query like `select count(*) from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: apache#22304
…ock codec (#23245) (#23526) * [Refactor](load) Extract load public code (#22304) * [fix](hive) do not split compress data file and support lz4/snappy block codec (#23245) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count(*)` query of csv file For query like `select count(*) from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: #22304 --------- Co-authored-by: zzzzzzzs <[email protected]>
…ock codec (apache#23245) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count(*)` query of csv file For query like `select count(*) from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: apache#22304
Proposed changes
do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.
Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec
Support snappy block codec
For hadoop snappy
Optimize the
count(*)
query of csv fileFor query like
select count(*) from tbl
, only need to split the line, no need to split the column.Need to pick to branch-2.0 after this PR: #22304
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...