-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opt](TabletSchema) reuse TabletColumn info to reduce mem #42448
Conversation
eldenmoon
commented
Oct 24, 2024
•
edited
Loading
edited
- When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption.
- Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
9b0eb93
to
02bec4e
Compare
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -942,7 +943,8 @@ void TabletSchema::clear_columns() { | |||
_cols.clear(); | |||
} | |||
|
|||
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns) { | |||
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'init_from_pb' exceeds recommended size/complexity thresholds [readability-function-size]
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,
^
Additional context
be/src/olap/tablet_schema.cpp:945: 85 lines including whitespace and comments (threshold 80)
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,
^
101870d
to
a98b6c0
Compare
run buildall |
a6ae48d
to
8e31fa5
Compare
run buildall |
8e31fa5
to
85d7e58
Compare
run buildall |
1 similar comment
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
d0f8a69
to
0761331
Compare
run buildall |
TeamCity be ut coverage result: |
run buildall |
1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
} | ||
|
||
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns) { | ||
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'init_from_pb' exceeds recommended size/complexity thresholds [readability-function-size]
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,
^
Additional context
be/src/olap/tablet_schema.cpp:956: 88 lines including whitespace and comments (threshold 80)
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,
^
9f10eaf
to
49747e9
Compare
run buildall |
TeamCity be ut coverage result: |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
} | ||
|
||
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns) { | ||
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'init_from_pb' exceeds recommended size/complexity thresholds [readability-function-size]
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,
^
Additional context
be/src/olap/tablet_schema.cpp:963: 88 lines including whitespace and comments (threshold 80)
void TabletSchema::init_from_pb(const TabletSchemaPB& schema, bool ignore_extracted_columns,
^
TeamCity be ut coverage result: |
FieldType::OLAP_FIELD_TYPE_ARRAY, | ||
FieldType::OLAP_FIELD_TYPE_FLOAT, | ||
}; | ||
if (column.is_extracted_column() && (invalid_types.contains(column.type()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check is_extracted_column?
static std::set<FieldType> invalid_types = { | ||
FieldType::OLAP_FIELD_TYPE_DOUBLE, | ||
FieldType::OLAP_FIELD_TYPE_JSONB, | ||
FieldType::OLAP_FIELD_TYPE_ARRAY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array is supported by inverted index
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision
1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision
1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision
1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision
1. When there are a large number of identical TabletColumns in the cluster, which usually occurs when VARIANT type columns are modified and added, each Rowset has an individual TabletSchema. Excessive TabletSchemas can lead to significant memory overhead. Reusing memory for identical TabletColumns would greatly reduce this memory consumption. 2. Serialized TabletSchema as LRU cache key could also increase memusage when large sets of schemas are in LRU cache, so inorder to reduce the memory footprint we just record the key signature caculated by generating an UUID by hash algorithm, and lookup the key signature in LRU cache, and check the key in case of hash collision