Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hive) do not split compress data file and support lz4/snappy block codec (#23245) #23526

Merged
merged 2 commits into from
Aug 26, 2023

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Aug 26, 2023

cherry-pick #22304 and #23245

@morningman morningman closed this Aug 26, 2023
…ock codec (apache#23245)

1. do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.

2. Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec

4. Support snappy block codec
For hadoop snappy

5. Optimize the `count(*)` query of csv file
For query like `select count(*) from tbl`, only need to split the line, no need to split the column.

Need to pick to branch-2.0 after this PR: apache#22304
@morningman morningman reopened this Aug 26, 2023
@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/test labels Aug 26, 2023
@morningman morningman changed the title [Refactor](load) Extract load public code (#22304) [fix](hive) do not split compress data file and support lz4/snappy block codec (#23245) Aug 26, 2023
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// specific language governing permissions and limitations
// under the License.

#include "util/load_util.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'util/load_util.h' file not found [clang-diagnostic-error]

#include "util/load_util.h"
         ^


class LoadUtilTest : public testing::Test {
public:
LoadUtilTest() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use '= default' to define a trivial default constructor [modernize-use-equals-default]

Suggested change
LoadUtilTest() {}
LoadUtilTest() = default;

@xiaokang xiaokang merged commit fd8882b into apache:branch-2.0 Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/planner Issues or PRs related to the query planner kind/test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants