-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node-wide throughput exemptions for clients #10755
Node-wide throughput exemptions for clients #10755
Conversation
6d19d35
to
99cfeb4
Compare
99cfeb4
to
c2174e0
Compare
Force push: fixed a followup from #10285, avoid possible segfault when "kafka_throughput_controlled_api_keys" changes on the fly. |
c2174e0
to
bfe47f0
Compare
Force push:
|
bfe47f0
to
64fb201
Compare
Force push: unit test fixed |
/ci-repeat 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. just a few suggestions about simplifying some stuff like the std::variant usage but those aren't blockers or anything.
struct unspecified_tag {}; | ||
std::variant<unspecified_tag, copyable_RE2> v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what std::monostate is for, or would this be better represented as optional<RE2>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what std::monostate is for
not exactly, monostate
is more for "uninitalized" semantics while this one is specifically for "unspecified", that's why I used a separate tag. In theory, there may be more that just these two.
or would this be better represented as optional
it may, but the value of this type is wrapped into optional
itself, so this variant is in part to avoid optional<optional<RE2>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh yeh, optional<optional<re2>>
is sus, but generally i think the outcome is better when stuff is represented as directly as possible, even if it means weird stuff like nested optionals. in actual usages you can unwrap each layer of the optional and then add a comment or something about what is going on.
return re2::RE2::FullMatch( | ||
re2::StringPiece(client_id.c_str(), client_id.length()), *client_id_re); | ||
return std::visit( | ||
[client_id](auto&& v) -> bool { | ||
using T = std::decay_t<decltype(v)>; | ||
if constexpr (std::is_same_v< | ||
T, | ||
client_id_matcher_type::unspecified_tag>) { | ||
// only missing client_id matches the unspecified tag | ||
return !client_id; | ||
} else if constexpr (std::is_same_v<T, copyable_RE2>) { | ||
// missing client_id never matches a re | ||
return client_id | ||
&& re2::RE2::FullMatch(re2::StringPiece(*client_id), v); | ||
} else { | ||
static_assert( | ||
always_false_v<T>, "non-exhaustive client_id_re_type visitor"); | ||
} | ||
}, | ||
client_id_matcher->v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this PR introduces throughput_control_group.cc why have this other commit that rewrites a bunch of it? seems like this should be squashed with the second commit to avoid extra reviewing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to split the large file into smaller parts to make reviewing easier :) If that's the other way around actually, I will squash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh i mean i think it's fine the way it is. i only mentioned it because it seemed like enough code was changing that it wasn't necessarily making review easier. but it isn't a problem either way.
return std::visit( | ||
[client_id](auto&& v) -> bool { | ||
using T = std::decay_t<decltype(v)>; | ||
if constexpr (std::is_same_v< | ||
T, | ||
client_id_matcher_type::unspecified_tag>) { | ||
// only missing client_id matches the unspecified tag | ||
return !client_id; | ||
} else if constexpr (std::is_same_v<T, copyable_RE2>) { | ||
// missing client_id never matches a re | ||
return client_id | ||
&& re2::RE2::FullMatch(re2::StringPiece(*client_id), v); | ||
} else { | ||
static_assert( | ||
always_false_v<T>, "non-exhaustive client_id_re_type visitor"); | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be if std::holds_alternative
instead of using visit with constexpr etc... but maybe using optional woudl be even simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've compared the above code with holds_alternative
approach:
if (std::holds_alternative<client_id_matcher_type::unspecified_tag>(
client_id_matcher->v)) {
// only missing client_id matches the unspecified tag
return !client_id;
}
if (auto* const v = std::get_if<copyable_RE2>(&client_id_matcher->v); v) {
// missing client_id never matches a re
return client_id
&& re2::RE2::FullMatch(re2::StringPiece(*client_id), *v);
}
assert(false, "non-exhaustive client_id_re_type visitor");
So it's a comparable amount of code, a bit simpler, but it has an important drawback: a runtime fallback case. With visit
, a missed case will cause compilation to fail, but with holds_alternative
it will be a runtime assert/soft assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe using optional woudl be even simpler?
+1
client_id); | ||
if (tcgroup_it == _kafka_throughput_control().cend()) { | ||
ctx->_exempt = false; | ||
vlog(klog.info, "qm - No throghput control group assigned"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be at info
level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the normal course that would happen once per connection lifetime, so I think info is fine. Unless we have supported clients that open and drop kafka connections all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we log anything else at connection establishment at info level? i don't think we do...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we log at INFO at connection loss here, but that's an exceptional future handling, so that does not happen on graceful connection close. Ok I'll demote it to DEBUG to align with the rest of connection event
64fb201
to
c975ac2
Compare
Force push:
|
c975ac2
to
1cd7048
Compare
Force push: rebased upstream |
1cd7048
to
9e375ba
Compare
Force push:
|
07786f1
to
cdaf180
Compare
Force push: a typo fixed in property description |
Followup to redpanda-data#10285. There the intake traffic point was left unconditional of the api key, which caused anomalities in tput related metrics.
throughput_control_group is a structural element for tput control groups configuration. This commit adds its implementation sufficient to cover tput exemptions by client_id matched with a regex.
always_false_v moved to utils/functional.h so that it can be reused
Client can omit client_id specification on its side. This is different from empty client_id value, and for throughput_control_group it is also different from omitting client_id (which means match anything regardless of client_id). This commit introduces special selector tags for client_id value in throughput_control_group. The syntax for selector tags is "+name", this makes them distinct from any regex because regex cannot start with a "+". One selector name is introduced: "empty", it matches the omitted client_id values and only them. To support that, throughput_control_group does not store a regex directly anymore. Now it's a variant type with one option for "empty" and another for the regex.
also a typo fixed in another description
connection_context now has all quota related stuff for the connection stored in `_snc_quota_context`. This object is supposed to be created once per connection lifetime by `snc_quota_manager`, but it will be recreated each time a client_id changes on the connection. When the quota context is created (lazily on the connection context), the `kafka_throughput_control` rules are used to select the matching throughput control group. If any group is matched, the context saves it as a flag to exempt the connection from any snc_quota_manager control. This will change into a full association with the control group. Currently the exempt flag simply tells the quota manager to skip any messages in that context.
Set some possible values for `kafka_throughput_control` in the configurations test
cdaf180
to
529cbf3
Compare
Force push: rebased upstream |
/ci-repeat 5 |
/backport v23.1.x |
Failed to run cherry-pick command. I executed the commands below:
|
This PR introduces a feature to make client_ids specified by a value, regex, or by a special selector for missing client_ids exempt from throughput control. Connections presenting a matching client_id value will not have the size of their requests accounted for throughput control, not they will be throttled.
The new configuration is forward looking to support the fully functional throughput control groups, this is why it may look too bulky for this specific PR.
Fixes #10575
Backports Required
Release Notes
Improvements
kafka_throughput_control
can be used to define throughput control groups for which Kafka traffic will not be limited by the values specified bykafka_throughput_limit_node_*_bps
.