-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-2752 - Fix Kafka quota throttling delay enforcement #18218
CORE-2752 - Fix Kafka quota throttling delay enforcement #18218
Conversation
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48616#018f39b0-9e5c-4b44-88da-d7f93d462f3f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49104#018f778d-8f78-4b00-9d19-0d54bdd12fbd ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49219#018f8150-c10b-42cb-adc8-db5f5b29dbd7 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49239#018f82d1-5c13-4525-a5a7-d629bf28ce2e |
64df46a
to
ebb9af3
Compare
Force-pushed to remove the wrapper struct from |
new failures in https://buildkite.com/redpanda/redpanda/builds/48783#018f5349-7685-4a4a-aacf-31045b60fda3:
new failures in https://buildkite.com/redpanda/redpanda/builds/48783#018f5349-7688-4e0e-94b5-2edf9064b89c:
new failures in https://buildkite.com/redpanda/redpanda/builds/48783#018f5352-1312-489e-8aa1-fc7ad40c43b4:
new failures in https://buildkite.com/redpanda/redpanda/builds/48783#018f5352-130f-4450-8ab3-0eebaa089785:
new failures in https://buildkite.com/redpanda/redpanda/builds/49219#018f8150-c106-4580-a04d-a434139cc1ed:
|
ebb9af3
to
d8dbe9e
Compare
Force-pushed to rebase to dev to try to fix the flaky tests. |
While exempt clients are already excluded from their traffic being recorded, they are still being throttled if the token bucket is in a negative state because of other clients. To avoid this, exempt clients should have a 0 throttling delay to be excluded from throttling altogether.
d8dbe9e
to
35ebcfe
Compare
Force-pushed with changes to separate the tracking the throttling state separately for fetch, produce and snc quotas. Now that I added a test with produce clients, I discovered that if we track Also, I moved the commit with the fix for snc quota exemption ahead. |
Quota enforcement is currently done in a complicated and not-Kafka-compatible way in Redpanda, and this commit intends to fix that. In Kafka >=2.0, client throttling is implemented in a simple way. Brokers are meant to return how long the client is supposed to be throttled when there is a quota violation. Then, the Kafka client is supposed to wait until this throttling time passed, or else the broker applies the throttle on its side. However, currently in Redpanda the delay is enforced differently. For produce, we enforce the throttle we would send in the response if there was a throttle in the last response (regardless of whether the client obeyed that throttle or not). For fetch, we always enforce the current throttle. For ingress/egress quotas we correctly track how long the client was supposed to be throttled, but we only do that for ingress/egress throttling. This commit fixes the throttling behaviour by tracking how long the client is supposed to have waited and applying that throttle on the next request if the client did not.
35ebcfe
to
ff96288
Compare
Force-pushed to address some more code review feedback. |
Are there tests for these?
|
Can you run a few rounds of ducktape for |
ff96288
to
b540b74
Compare
I've force-pushed an explicit test for this now.
This has already been there in the last commit.
Locally I've run these with |
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
Quota enforcement is currently done in a complicated and
not-Kafka-compatible way in Redpanda, and this intends to fix
that.
In Kafka >=2.0, client throttling is implemented in a simple way.
Brokers are meant to return how long the client is supposed to be
throttled when there is a quota violation. Then, the Kafka client is
supposed to wait until this throttling time passed, or else the broker
applies the throttle on its side.
However, currently in Redpanda the delay is enforced differently. For
produce, we enforce the throttle we would send in the response if there
was a throttle in the last response (regardless of whether the client
obeyed that throttle or not). For fetch, we always enforce the current
throttle. For ingress/egress quotas we correctly track how long the
client was supposed to be throttled, but we only do that for
ingress/egress throttling.
This fixes the throttling behaviour by tracking how long the
client is supposed to have waited and applying that throttle on the next
request if the client did not.
Fixes https://redpandadata.atlassian.net/browse/CORE-2752
Backports Required
Release Notes
Bug Fixes