-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: add observability metrics to throttling behaviour in kafka #117618
Comments
wenyihu6
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-cdc
Change Data Capture
labels
Jan 10, 2024
cc @cockroachdb/cdc |
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Jan 11, 2024
This patch injects crdb cdc metrics into sarama code to provide more observability into throttling behaviour from kafka. Fixes: cockroachdb#117618 Release note: (to add later) TODO(wenyihu6): discuss if this is a good approach + what other sarama metrics are useful and should be added as well
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Jan 24, 2024
This patch injects crdb cdc metrics into sarama code to provide more observability into throttling behaviour from kafka. Fixes: cockroachdb#117618 Release note: (to add later) TODO(wenyihu6): discuss if this is a good approach + what other sarama metrics are useful and should be added as well
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Jan 29, 2024
This patch injects crdb cdc metrics into sarama code to provide more observability into throttling behaviour from kafka. Fixes: cockroachdb#117618 Release note: (to add later) TODO(wenyihu6): discuss if this is a good approach + what other sarama metrics are useful and should be added as well
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Jan 30, 2024
This patch injects crdb cdc metrics into sarama code to provide more observability into throttling behaviour from kafka. Fixes: cockroachdb#117618 Release note: (to add later) TODO(wenyihu6): discuss if this is a good approach + what other sarama metrics are useful and should be added as well
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Jan 31, 2024
This patch injects crdb cdc metrics into sarama code to provide more observability into throttling behaviour from kafka. Fixes: cockroachdb#117618 Release note: (to add later) TODO(wenyihu6): discuss if this is a good approach + what other sarama metrics are useful and should be added as well
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 1, 2024
Now that this patch has been merged, sarama now acknowledges and reacts to kafka server's throttling messages by slowing down. To provide better observability into sarama code, this patch adds a metrics registry interceptor and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota. Release note: changefeed.kafka_throttling_hist_nanos has now been added to metrics to monitor sarama throttling behavior resulting from exceeding kafka quota. Fixes: cockroachdb#117618
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 1, 2024
This patch adds a new roachtest to observe kafka throttling behaviour. It's a bit tricky to find the passing condition here since we are expecting throttling behaviour and would like to assert that the latency is above the target. Part of: cockroachdb#117618 Release note:none
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 2, 2024
Now that this patch has been merged, sarama now acknowledges and reacts to kafka server's throttling messages by slowing down. To provide better observability into sarama code, this patch adds a metrics registry interceptor and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota. Release note: changefeed.kafka_throttling_hist_nanos has now been added to metrics to monitor sarama throttling behavior resulting from exceeding kafka quota. Fixes: cockroachdb#117618
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 2, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and reacts to kafka server's throttling messages by slowing down. To provide better observability into sarama code, this patch adds a metrics registry interceptor and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota. Fixes: cockroachdb#117618 Release note: changefeed.kafka_throttling_hist_nanos has now been added to metrics to monitor sarama throttling behavior resulting from exceeding kafka quota.
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 2, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and reacts to kafka server's throttling messages by slowing down. To provide better observability into sarama code, this patch adds a metrics registry interceptor and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota. Fixes: cockroachdb#117618 Release note: changefeed.kafka_throttling_hist_nanos has now been added to metrics to monitor sarama throttling behavior resulting from exceeding kafka quota.
craig bot
pushed a commit
that referenced
this issue
Feb 6, 2024
117117: kv: log slow requests on replica level in addition to range level r=shralex a=shralex Previously, slow requests were only logged at the range level, but the logs did not indicate which replica is slow. Moreover, the SlowRPC metric attempted to represent the number of requests currently being retried, however it was done on the range level and therefore missed a second level of replica-level retries being done underneath. This PR adds logging on the replica level, removes a confusing log line, and changes the metric to count the number of slow requests in a simpler manner. Epic: https://cockroachlabs.atlassian.net/browse/CRDB-33510 Fixes: #114431 117693: changefeedccl: add observability metrics into sarama code r=rharding6373 a=wenyihu6 Now that this patch (#117544) has been merged, sarama now acknowledges and reacts to kafka server's throttling messages by slowing down. To provide better observability into sarama code, this patch adds a metrics registry interceptor and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota. Fixes: #117618 Release note: changefeed.kafka_throttling_hist_nanos has now been added to metrics to monitor sarama throttling behavior resulting from exceeding kafka quota. 118372: sql: fix flake in TestTxnContentionEventsTable r=yuzefovich a=michae2 In causeContention we deliberately hold a transaction open using pg_sleep to block an update statement. The timing we're trying to achieve is: 1. transaction insert 2. update starts and blocks 3. transaction held open using pg_sleep We were using a WaitGroup to order (2) after (1), but there was no synchronization to ensure (3) came after (2). This commit adds a retry loop that checks `crdb_internal.cluster_queries` to ensure (3) comes after (2). Fixes: #118236 Release note: None 118760: builtins: allow VIEWACTIVITY priv to use crdb_internal.request_statem… r=xinhaoz a=xinhaoz …ent_bundle Previously only those with the VIEWACTIVITY role could use the crdb_internal.request_statement_bundle builtin. We should allow the VIEWACTIVITY privilege as well since role options are now deprecated. This allow also allow stmt bundle requests to be made from db-console for users with this granted privilege. Epic: none Fixes: #118759 Release note (bug fix): Those with VIEWACTIVITY privilege can now request statement bundles using crdb_internal.requets_statement_bundle or via db-console's sql activity page. 118767: release: confirm yum install r=celiala a=rail This adds `-y` flag to install `yum` without user prompt. Epic: none Release note: None 118789: jobs,application_api: replace calls to `skip.Stress` with `skip.Duress` r=celiala a=rickystewart `skip.Duress()` seems like it should have been used in this case as it gives more time under both `race` and `deadlock`. This will give these tests some extra time if they run in a heavyweight configuration but not "under stress". Epic: CRDB-8308 Release note: None 118792: kvfollowerreadsccl: skip test under `race` not `stressrace` r=celiala a=rickystewart Epic: CRDB-8308 Release note: None 118797: bincheck: do not run geos tests on Windows r=celiala a=rail In #106642 we stopped shipping libgeos on Windows, but didn't update the bincheck test to reflect the change. Epic: none Release note: None Co-authored-by: shralex <[email protected]> Co-authored-by: Wenyi Hu <[email protected]> Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Rail Aliiev <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 21, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and reacts to kafka server's throttling messages by slowing down. To provide better observability into sarama code, this patch adds a metrics registry interceptor and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota. Fixes: cockroachdb#117618 Release note: changefeed.kafka_throttling_hist_nanos has now been added to metrics to monitor sarama throttling behavior resulting from exceeding kafka quota.
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Mar 22, 2024
This patch adds a new roachtest to observe kafka throttling behaviour. It's a bit tricky to find the passing condition here since we are expecting throttling behaviour and would like to assert that the latency is above the target. Part of: cockroachdb#117618 Release note:none
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Mar 25, 2024
This patch adds a new roachtest to observe kafka throttling behaviour. It's a bit tricky to find the passing condition here since we are expecting throttling behaviour and would like to assert that the latency is above the target. Part of: cockroachdb#117618 Release note:none
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We should find a way to add observability into sarama code https://github.com/IBM/sarama/blob/d9abf3cfb14d41fbc059d0f36195603000b1aad2/broker.go#L1695C1-L1696C38 when there is throttling behaviour kafka.
One way is to inject our metrics into sarama's code using metricRegistry https://github.com/IBM/sarama/blob/d9abf3cfb14d41fbc059d0f36195603000b1aad2/async_producer.go#L89.
Part of: #92290
Jira issue: CRDB-35272
The text was updated successfully, but these errors were encountered: