-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trivial count optimization with primary key #60463
Conversation
This is an automated comment for commit 9419a25 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
} | ||
else /// trivial count optimization | ||
{ | ||
expr_or_filter_node = &projection_reading_node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename expr_or_filter_node
to expr_or_filter_or_projection_reading_node
or to something else shorter :)
/// Stores row count from exact ranges of parts. | ||
size_t exact_count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this from AggregateProjectionCandidates
. This is only initialized/used in optimizeUseAggregateProjections itself.
@@ -101,7 +101,9 @@ class ReadFromMergeTree final : public SourceStepWithFilter | |||
UInt64 selected_marks_pk = 0; | |||
UInt64 total_marks_pk = 0; | |||
UInt64 selected_rows = 0; | |||
bool find_exact_ranges = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to has_exact_ranges
?
@@ -88,13 +88,14 @@ String extractFixedPrefixFromLikePattern(std::string_view like_pattern, bool req | |||
return fixed_prefix; | |||
} | |||
|
|||
/// for "^prefix..." string it returns "prefix" | |||
static String extractFixedPrefixFromRegularExpression(const String & regexp) | |||
/// for "^prefix..." string it returns ("prefix", is inclusive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment what is inclusive
flag means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this change in a separate PR?
And, possibly, other changes in KeyCondition which are not about trivial count optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll extract common building blocks into a separate PR.
@@ -158,6 +162,7 @@ static String extractFixedPrefixFromRegularExpression(const String & regexp) | |||
if (!fixed_prefix.empty()) | |||
fixed_prefix.pop_back(); | |||
|
|||
inclusive = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is at least not clear to me.
E.g. for abc.*
, would not this return abc, false
? (I expect abc
should be accepted for this regex)
@@ -1797,6 +1802,8 @@ bool KeyCondition::extractAtomFromTree(const RPNBuilderTreeNode & node, RPNEleme | |||
func_name = "lessOrEquals"; | |||
else if (func_name == "greater") | |||
func_name = "greaterOrEquals"; | |||
|
|||
relaxed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me that this is the only case.
I don't know how to prove the correctness.
Ideally, I would prefer some kind of approach that is clear to be correct, even if it works only for very simple cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will provide some proofs.
src/Storages/MergeTree/BoolMask.h
Outdated
bool isComplete(const BoolMask & initial_mask) const | ||
{ | ||
return {can_be_false, can_be_true}; | ||
if (initial_mask == consider_only_can_be_true) | ||
return can_be_true; | ||
else if (initial_mask == consider_only_can_be_false) | ||
return can_be_false; | ||
else | ||
return can_be_true && can_be_false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand why this change is needed.
const BoolMask BoolMask::consider_only_can_be_true(false, true);
const BoolMask BoolMask::consider_only_can_be_false(true, false);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will explain it in separate PR.
@@ -3106,6 +3113,58 @@ bool KeyCondition::alwaysFalse() const | |||
return rpn_stack[0] == 0; | |||
} | |||
|
|||
bool KeyCondition::canBeFalseAlwaysUnknown() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the implementation.
If we have the function canBeTrueAlwaysUnknown
, what would be the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have the function canBeTrueAlwaysUnknown, what would be the difference?
The main difference would be the implementation of RPNElement::FUNCTION_OR
and RPNElement::FUNCTION_AND
.
atom_can_be_false_always_unknown AND any_atom => atom_can_be_false_always_unknown
atom_can_be_true_always_unknown OR any_atom => atom_can_be_true_always_unknown
canBeFalseAlwaysUnknown
is introduced as an optimization to find exact ranges during key analysis. If the given key condition is not possible to tell some hyperrectangle can be false, exact ranges cannot be determined. We'll disable exact range finding and fall back to faster analysis by checking .can_be_true
only.
{ | ||
if (range.end == marks_count && !has_final_mark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to ease the implementation of finding exact ranges (end points are important). Will refactor this in separate PR.
@@ -1158,28 +1179,23 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange( | |||
} | |||
else | |||
{ | |||
if (has_final_mark && range.end == marks_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't understand why this isremoved
0e618ca
to
683dc03
Compare
683dc03
to
9419a25
Compare
Unrelated. Failure in
|
96fcc30
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support partial trivial count optimization when the query filter is able to select exact ranges from merge tree tables.
This implements a more general version of #36732
This PR also improves key analysis with less checkInRange calls.
This PR also improves projection analysis with less selectRangesToRead calls.
This PR also fixes incorrect logic in BoolMask, although they are unused before.