[Opt](TabletSchema) reuse TabletColumn info to reduce mem #42448

eldenmoon · 2024-10-24T15:16:05Z

When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption.
Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm， and lookup the key signature in LRU cache， and check the key in case of hash collision

doris-robot · 2024-10-24T15:16:11Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

eldenmoon · 2024-10-24T15:16:18Z

run buildall

github-actions · 2024-10-24T15:21:42Z

clang-tidy review says "All clean, LGTM! 👍"

eldenmoon · 2024-10-24T15:40:43Z

run buildall

github-actions

clang-tidy made some suggestions

github-actions · 2024-10-24T15:45:58Z

be/src/olap/tablet_schema.cpp

@@ -942,7 +943,8 @@ void TabletSchema::clear_columns() {
    _cols.clear();
 }

-void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns) {
+void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,


warning: function 'init_from_pb' exceeds recommended size/complexity thresholds [readability-function-size]

void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, ^

Additional context

be/src/olap/tablet_schema.cpp:945: 85 lines including whitespace and comments (threshold 80)

void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, ^

eldenmoon · 2024-10-25T02:16:35Z

run buildall

eldenmoon · 2024-10-29T03:13:56Z

run buildall

eldenmoon · 2024-10-29T03:16:16Z

run buildall

eldenmoon · 2024-10-29T07:24:28Z

run buildall

doris-robot · 2024-10-29T08:57:03Z

TeamCity be ut coverage result:
Function Coverage: 37.49% (9727/25946)
Line Coverage: 28.74% (80657/280661)
Region Coverage: 28.15% (41685/148059)
Branch Coverage: 24.72% (21178/85672)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a4fb4efc304993000cafa4eac9ab68dc8fc56c7c_a4fb4efc304993000cafa4eac9ab68dc8fc56c7c/report/index.html

eldenmoon · 2024-10-30T03:32:58Z

run buildall

doris-robot · 2024-10-30T04:55:27Z

TeamCity be ut coverage result:
Function Coverage: 37.96% (9853/25957)
Line Coverage: 29.23% (82034/280673)
Region Coverage: 28.57% (42299/148070)
Branch Coverage: 25.10% (21499/85666)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6df5f7abaaaab13594aad54e96f4994291637217_6df5f7abaaaab13594aad54e96f4994291637217/report/index.html

eldenmoon · 2024-10-31T07:26:12Z

run buildall

doris-robot · 2024-10-31T08:55:25Z

TeamCity be ut coverage result:
Function Coverage: 37.96% (9857/25966)
Line Coverage: 29.20% (82098/281158)
Region Coverage: 28.45% (42358/148876)
Branch Coverage: 25.03% (21522/85990)
Coverage Report: http://coverage.selectdb-in.cc/coverage/076133182948d3e29e1df91e1938fd3c3b894996_076133182948d3e29e1df91e1938fd3c3b894996/report/index.html

eldenmoon · 2024-10-31T10:52:08Z

run buildall

1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm， and lookup the key signature in LRU cache， and check the key in case of hash collision

github-actions

clang-tidy made some suggestions

github-actions · 2024-11-04T09:40:23Z

be/src/olap/tablet_schema.cpp

 }

-void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns) {
+void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,


warning: function 'init_from_pb' exceeds recommended size/complexity thresholds [readability-function-size]

void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, ^

Additional context

be/src/olap/tablet_schema.cpp:956: 88 lines including whitespace and comments (threshold 80)

void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, ^

eldenmoon · 2024-11-04T09:41:15Z

run buildall

doris-robot · 2024-11-04T11:05:21Z

TeamCity be ut coverage result:
Function Coverage: 37.84% (9832/25984)
Line Coverage: 29.00% (81742/281883)
Region Coverage: 28.24% (42135/149221)
Branch Coverage: 24.82% (21378/86148)
Coverage Report: http://coverage.selectdb-in.cc/coverage/49747e9db3aa093b06c3f00ab923da751fa46d3f_49747e9db3aa093b06c3f00ab923da751fa46d3f/report/index.html

eldenmoon · 2024-11-06T02:02:38Z

run buildall

github-actions

clang-tidy made some suggestions

github-actions · 2024-11-06T02:07:47Z

be/src/olap/tablet_schema.cpp

 }

-void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns) {
+void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,


warning: function 'init_from_pb' exceeds recommended size/complexity thresholds [readability-function-size]

void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, ^

Additional context

be/src/olap/tablet_schema.cpp:963: 88 lines including whitespace and comments (threshold 80)

void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, ^

doris-robot · 2024-11-06T03:49:32Z

TeamCity be ut coverage result:
Function Coverage: 37.89% (9847/25991)
Line Coverage: 29.05% (81907/281915)
Region Coverage: 28.28% (42185/149182)
Branch Coverage: 24.86% (21408/86104)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7fcc7d58f1c335b94adc0584bc83d2ebd65bd90e_7fcc7d58f1c335b94adc0584bc83d2ebd65bd90e/report/index.html

xiaokang · 2024-11-06T04:14:30Z

be/src/olap/rowset/segment_v2/inverted_index_writer.cpp

+            FieldType::OLAP_FIELD_TYPE_ARRAY,
+            FieldType::OLAP_FIELD_TYPE_FLOAT,
+    };
+    if (column.is_extracted_column() && (invalid_types.contains(column.type()))) {


why check is_extracted_column?

xiaokang · 2024-11-06T04:14:42Z

be/src/olap/rowset/segment_v2/inverted_index_writer.cpp

+    static std::set<FieldType> invalid_types = {
+            FieldType::OLAP_FIELD_TYPE_DOUBLE,
+            FieldType::OLAP_FIELD_TYPE_JSONB,
+            FieldType::OLAP_FIELD_TYPE_ARRAY,


array is supported by inverted index

github-actions · 2024-11-06T04:21:40Z

PR approved by at least one committer and no changes requested.

github-actions · 2024-11-06T04:21:42Z

PR approved by anyone and no changes requested.

qidaye

LGTM

1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm， and lookup the key signature in LRU cache， and check the key in case of hash collision

…43326) (#42448)

eldenmoon marked this pull request as draft October 24, 2024 15:16

eldenmoon force-pushed the be_column_mem branch from 9b0eb93 to 02bec4e Compare October 24, 2024 15:40

github-actions bot reviewed Oct 24, 2024

View reviewed changes

eldenmoon force-pushed the be_column_mem branch 3 times, most recently from 101870d to a98b6c0 Compare October 25, 2024 02:15

eldenmoon changed the title ~~Be column mem~~ [Opt](TabletSchema) reuse column info to reduce mem Oct 29, 2024

eldenmoon changed the title ~~[Opt](TabletSchema) reuse column info to reduce mem~~ [Opt](TabletSchema) reuse TabletColumn info to reduce mem Oct 29, 2024

eldenmoon force-pushed the be_column_mem branch 2 times, most recently from a6ae48d to 8e31fa5 Compare October 29, 2024 03:12

eldenmoon added dev/2.1.x dev/3.0.x labels Oct 29, 2024

eldenmoon marked this pull request as ready for review October 29, 2024 03:13

eldenmoon force-pushed the be_column_mem branch from 8e31fa5 to 85d7e58 Compare October 29, 2024 03:15

eldenmoon force-pushed the be_column_mem branch 2 times, most recently from d0f8a69 to 0761331 Compare October 31, 2024 07:25

github-actions bot reviewed Nov 4, 2024

View reviewed changes

eldenmoon force-pushed the be_column_mem branch from 9f10eaf to 49747e9 Compare November 4, 2024 09:40

Merge branch 'master' into be_column_mem

7fcc7d5

github-actions bot reviewed Nov 6, 2024

View reviewed changes

xiaokang approved these changes Nov 6, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 6, 2024

github-actions bot added the reviewed label Nov 6, 2024

qidaye approved these changes Nov 6, 2024

View reviewed changes

eldenmoon merged commit 743097a into apache:master Nov 6, 2024
25 of 28 checks passed

eldenmoon deleted the be_column_mem branch November 6, 2024 06:10

github-actions bot added cherry-pick-conflict-in-3.0 cherry-pick-conflict-in-2.1.x labels Nov 6, 2024

eldenmoon mentioned this pull request Nov 6, 2024

[Opt](TabletSchema) reuse TabletColumn info to reduce mem (#42448) #43326

Merged

eldenmoon mentioned this pull request Nov 6, 2024

[Opt](TabletSchema) reuse TabletColumn info to reduce mem (#42448) #43349

Open

eldenmoon added a commit that referenced this pull request Nov 6, 2024

[Opt](TabletSchema) reuse TabletColumn info to reduce mem (#42448) (#…

db22100

…43326) (#42448)

eldenmoon added dev/3.0.3-merged and removed cherry-pick-conflict-in-3.0 dev/3.0.x labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Opt](TabletSchema) reuse TabletColumn info to reduce mem #42448

[Opt](TabletSchema) reuse TabletColumn info to reduce mem #42448

eldenmoon commented Oct 24, 2024 •

edited

Loading

doris-robot commented Oct 24, 2024

eldenmoon commented Oct 24, 2024

github-actions bot commented Oct 24, 2024

eldenmoon commented Oct 24, 2024

github-actions bot left a comment

github-actions bot Oct 24, 2024

eldenmoon commented Oct 25, 2024

eldenmoon commented Oct 29, 2024

eldenmoon commented Oct 29, 2024

eldenmoon commented Oct 29, 2024

doris-robot commented Oct 29, 2024

eldenmoon commented Oct 30, 2024

doris-robot commented Oct 30, 2024

eldenmoon commented Oct 31, 2024

doris-robot commented Oct 31, 2024

eldenmoon commented Oct 31, 2024

github-actions bot left a comment

github-actions bot Nov 4, 2024

eldenmoon commented Nov 4, 2024

doris-robot commented Nov 4, 2024

eldenmoon commented Nov 6, 2024

github-actions bot left a comment

github-actions bot Nov 6, 2024

doris-robot commented Nov 6, 2024

xiaokang Nov 6, 2024

xiaokang Nov 6, 2024

github-actions bot commented Nov 6, 2024

github-actions bot commented Nov 6, 2024

qidaye left a comment

[Opt](TabletSchema) reuse TabletColumn info to reduce mem #42448

[Opt](TabletSchema) reuse TabletColumn info to reduce mem #42448

Conversation

eldenmoon commented Oct 24, 2024 • edited Loading

doris-robot commented Oct 24, 2024

eldenmoon commented Oct 24, 2024

github-actions bot commented Oct 24, 2024

eldenmoon commented Oct 24, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Oct 24, 2024

Choose a reason for hiding this comment

eldenmoon commented Oct 25, 2024

eldenmoon commented Oct 29, 2024

eldenmoon commented Oct 29, 2024

eldenmoon commented Oct 29, 2024

doris-robot commented Oct 29, 2024

eldenmoon commented Oct 30, 2024

doris-robot commented Oct 30, 2024

eldenmoon commented Oct 31, 2024

doris-robot commented Oct 31, 2024

eldenmoon commented Oct 31, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Nov 4, 2024

Choose a reason for hiding this comment

eldenmoon commented Nov 4, 2024

doris-robot commented Nov 4, 2024

eldenmoon commented Nov 6, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Nov 6, 2024

Choose a reason for hiding this comment

doris-robot commented Nov 6, 2024

xiaokang Nov 6, 2024

Choose a reason for hiding this comment

xiaokang Nov 6, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 6, 2024

github-actions bot commented Nov 6, 2024

qidaye left a comment

Choose a reason for hiding this comment

eldenmoon commented Oct 24, 2024 •

edited

Loading