Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3582] Support PageIndex #4634

Merged
merged 24 commits into from
Mar 20, 2024
Merged

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Feb 4, 2024

What changes were proposed in this pull request?

  1. Refer KeyCondition, using RPN(which is reverse polish notation) to calculate expression.
  2. Once RowRanges is got, skip io by getStream
  3. Still use Arrow Parquet Reader

(Fixes: #3582)

How was this patch tested?

Using Existed UT

Copy link

github-actions bot commented Feb 4, 2024

#3582

Copy link

github-actions bot commented Feb 4, 2024

Run Gluten Clickhouse CI

2 similar comments
Copy link

github-actions bot commented Feb 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Feb 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Feb 4, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Feb 4, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 2, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 5, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

@baibaichen
Copy link
Contributor Author

dd7b6b232764642bcfb3abc29e70560

@baibaichen
Copy link
Contributor Author

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

add ParquetFileReaderExtBase
add readColumnChunkPageBase
simpilefy build read
remove redundant codes
reemove current_row_group_
std::vector<int32_t> row_groups_ => std::deque<int32_t> row_groups_
std::vector<std::unique_ptr<RowRanges>> row_group_row_ranges_ => std::unordered_map<int32_t, std::unique_ptr<RowRanges>> row_group_row_ranges_
std::vector<std::unique_ptr<ColumnIndexStore>> row_group_column_index_stores_ => std::unordered_map<int32_t, std::unique_ptr<ColumnIndexStore>> row_group_column_index_stores_;
remove std::vector<std::unique_ptr<parquet::RowGroupMetaData>> row_group_metas_;
remove std::vector<std::shared_ptr<parquet::RowGroupPageIndexReader>> row_group_index_readers_
(cherry picked from commit bce0c6668d7bb397127eefeac1943d4c02cf79dc)
fix a stupid bug!
add testDataPath
getTpcdsDataPath() => tpcdsDataPath
getClickHouseLibPath() => clickHouseLibPath
(cherry picked from commit bb0267135243ff8ad980b0521d8302e150a2c4e4)
(cherry picked from commit 98dc9a79bf4f372ecabcac9b47aa06cd328f1aa4)
(cherry picked from commit 2fb41831f4e338503ff620ce5eac9917bdb68f6a)
(cherry picked from commit 1ace73205a033e14ca1659f063eb1df65c3e9969)
(cherry picked from commit e7d8fbe701fcd92fb6cb167686602561adc26ec4)
(cherry picked from commit 1ee0516e2eadf045b4aec63de67cf5cb97810217)
(cherry picked from commit 1e9cdd3b08eb4e026a739ee558e9c2dd0c4c88fb)
…nges>>;

using ColumnIndexStoreMap = absl::flat_hash_map<Int32, std::unique_ptr<ColumnIndexStore>>;

(cherry picked from commit 610fcd038d24d54fa30bcc40ab0d4d39f60dd0c4)
(cherry picked from commit 8d85db48fe1c93dbc05404aa580b3f11de94c51d)
Copy link

Run Gluten Clickhouse CI

@lgbo-ustc
Copy link
Contributor

LGTM

@baibaichen
Copy link
Contributor Author

GlutenWithCHStandard tpch-data-sf100-bucket
With 4634 mean-total 60332, min-total 57646
Without 4634 mean-total 67155, min-total 65188

Around 8 seconds improvment

@baibaichen
Copy link
Contributor Author

Let's Merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Improve parquet reader performacne
4 participants