Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4994][CH]Fix function conversions #4995

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

KevinyhZou
Copy link
Contributor

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

(Fixes: #4994)

How was this patch tested?

Copy link

#4994

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@baibaichen baibaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baibaichen
Copy link
Contributor

Re-run marco-benchmark once #4634 is merged

@baibaichen baibaichen merged commit 81b023e into apache:main Mar 18, 2024
4 checks passed
baibaichen added a commit to baibaichen/gluten that referenced this pull request Mar 18, 2024
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_03_18_2024_time.csv log/native_master_03_17_2024_2ca27fb04_time.csv difference percentage
q1 34.55 35.76 1.209 103.50%
q2 23.93 23.82 -0.114 99.52%
q3 37.82 36.74 -1.078 97.15%
q4 38.56 39.53 0.972 102.52%
q5 70.54 70.25 -0.294 99.58%
q6 7.37 7.38 0.010 100.13%
q7 82.54 84.15 1.612 101.95%
q8 83.80 84.82 1.018 101.21%
q9 122.07 122.79 0.714 100.58%
q10 47.80 46.92 -0.880 98.16%
q11 20.52 20.14 -0.389 98.10%
q12 29.90 25.87 -4.033 86.52%
q13 48.72 47.00 -1.723 96.46%
q14 16.49 22.07 5.575 133.80%
q15 29.10 32.36 3.265 111.22%
q16 13.70 14.09 0.389 102.84%
q17 100.81 100.46 -0.356 99.65%
q18 143.37 143.53 0.161 100.11%
q19 15.76 13.60 -2.169 86.24%
q20 28.60 29.63 1.025 103.59%
q21 225.81 226.19 0.385 100.17%
q22 14.02 13.82 -0.194 98.62%
total 1235.80 1240.91 5.105 100.41%

baibaichen added a commit to baibaichen/gluten that referenced this pull request Mar 19, 2024
baibaichen added a commit to baibaichen/gluten that referenced this pull request Mar 19, 2024
baibaichen added a commit that referenced this pull request Mar 20, 2024
* Fix typo

(cherry picked from commit c3fbf13)

* 1. using FutureSetFromTuple instead of FutureSetFromStorage. FutureSetFromTuple can buildOrderedSetInplace automatocally, FutureSetFromStorage need set Sizelimits mannually

2. Support PageIndex,  set spark.gluten.sql.columnar.backend.ch.runtime_config.use_local_format to true again.

3. Remove skipped test

* refactor gtest

* fix build due to #4664

* v2 for finding performance issue

* Refactor:
add ParquetFileReaderExtBase
add readColumnChunkPageBase
simpilefy build read
remove redundant codes
reemove current_row_group_
std::vector<int32_t> row_groups_ => std::deque<int32_t> row_groups_
std::vector<std::unique_ptr<RowRanges>> row_group_row_ranges_ => std::unordered_map<int32_t, std::unique_ptr<RowRanges>> row_group_row_ranges_
std::vector<std::unique_ptr<ColumnIndexStore>> row_group_column_index_stores_ => std::unordered_map<int32_t, std::unique_ptr<ColumnIndexStore>> row_group_column_index_stores_;
remove std::vector<std::unique_ptr<parquet::RowGroupMetaData>> row_group_metas_;
remove std::vector<std::shared_ptr<parquet::RowGroupPageIndexReader>> row_group_index_readers_

* new loop

* Cleanup

* Cleanup

* Revert: fix build due to #4664

* support case_insensitive_column_matching of parquet

(cherry picked from commit bce0c6668d7bb397127eefeac1943d4c02cf79dc)

* fix case_insensitive_column_matching issue
fix a stupid bug!
add testDataPath
getTpcdsDataPath() => tpcdsDataPath
getClickHouseLibPath() => clickHouseLibPath

* add benchmark

(cherry picked from commit bb0267135243ff8ad980b0521d8302e150a2c4e4)

* lowercase first letter of function name

(cherry picked from commit 98dc9a79bf4f372ecabcac9b47aa06cd328f1aa4)

* add comments

(cherry picked from commit 2fb41831f4e338503ff620ce5eac9917bdb68f6a)

* Remove Camel case member variable

(cherry picked from commit 1ace73205a033e14ca1659f063eb1df65c3e9969)

* Use Int32 instead of int32_t

(cherry picked from commit e7d8fbe701fcd92fb6cb167686602561adc26ec4)

* Camel case for function name

(cherry picked from commit 1ee0516e2eadf045b4aec63de67cf5cb97810217)

* add ColumnIndexFilterPtr alias

(cherry picked from commit 1e9cdd3b08eb4e026a739ee558e9c2dd0c4c88fb)

* using RowRangesMap = absl::flat_hash_map<Int32, std::unique_ptr<RowRanges>>;
using ColumnIndexStoreMap = absl::flat_hash_map<Int32, std::unique_ptr<ColumnIndexStore>>;

(cherry picked from commit 610fcd038d24d54fa30bcc40ab0d4d39f60dd0c4)

* fix style

(cherry picked from commit 8d85db48fe1c93dbc05404aa580b3f11de94c51d)

* fix benchmark due to #4995

* fix build due to ClickHouse/ClickHouse#61502

* fix assertion failed in Debug Build
@KevinyhZou KevinyhZou deleted the Fix_function_conversions branch March 25, 2024 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Gluten build error as FunctionConversions.h removed from ClickHouse code
3 participants