You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are observing that when there is a lot of traffic in the network ( high network bandwidth utilization) , consumers are loosing some messages. ie if 1000 messages are sent from producer only 995 are getting received for a topic. When this happens I see below error messages in my consumer ( note that producers received ACK for all 1000 messages).
%5|1726158458.810|REQTMOUT|rdkafka#producer-37| [thrd:kafka-2.kafka.confluent.svc.cluster.local:9092/2]: kafka-2.kafka.confluent.svc.cluster.local:9092/2: Timed out ApiVersionRequest in flight (after 10009ms, timeout #0)
%4|1726158458.810|FAIL|rdkafka#producer-37| [thrd:kafka-2.kafka.confluent.svc.cluster.local:9092/2]: kafka-2.kafka.confluent.svc.cluster.local:9092/2: ApiVersionRequest failed: Local: Timed out: probably due to broker version < 0.10 (see api.version.request configuration) (after 10009ms in state APIVERSION_QUERY)
%3|1726158458.811|ERROR|rdkafka#producer-37| [thrd:app]: rdkafka#producer-37: kafka-2.kafka.confluent.svc.cluster.local:9092/2: ApiVersionRequest failed: Local: Timed out: probably due to broker version < 0.10 (see api.version.request configuration) (after 10009ms in state APIVERSION_QUERY)
%4|1726158458.811|REQTMOUT|rdkafka#producer-37| [thrd:kafka-2.kafka.confluent.svc.cluster.local:9092/2]: kafka-2.kafka.confluent.svc.cluster.local:9092/2: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%4|1726158463.397|SESSTMOUT|rdkafka#consumer-38| [thrd:main]: Consumer group session timed out (in join-state steady) after 45014 ms without a successful response from the group coordinator (broker 3, last error was Success): revoking assignment and rejoining group
%4|1726158465.227|COMMITFAIL|rdkafka#consumer-38| [thrd:main]: Offset commit (unassigned partitions) failed for 12/12 partition(s) in join-state wait-unassign-to-complete: Broker: Unknown member: <omitted_topic_name>[0]@147(Broker: Unknown member), <omitted_topic_name>[1]@83(Broker: Unknown member), <omitted_topic_name>[2]@92(Broker: Unknown member), 53955e13-cba3-4fe0-b72b-36b00f6ae...
It would be helpful if someone can suggest why this happening ? Any known bug ? Any librdkafka config tuning required ? or some next steps to fix this issue.
How to reproduce
I am not sure if this can be reproduced easily as this is happening in our on-prem air-gapped setup which has multiple producer and consumer nodes. If we can flood the network and at the same time try to produce around 50k messages and consume them, this might get reproduced.
Description
We are observing that when there is a lot of traffic in the network ( high network bandwidth utilization) , consumers are loosing some messages. ie if 1000 messages are sent from producer only 995 are getting received for a topic. When this happens I see below error messages in my consumer ( note that producers received ACK for all 1000 messages).
It would be helpful if someone can suggest why this happening ? Any known bug ? Any librdkafka config tuning required ? or some next steps to fix this issue.
How to reproduce
I am not sure if this can be reproduced easily as this is happening in our on-prem air-gapped setup which has multiple producer and consumer nodes. If we can flood the network and at the same time try to produce around 50k messages and consume them, this might get reproduced.
Checklist
1.8.2
7.6.0
Ubuntu 20.04.6 LTS
The text was updated successfully, but these errors were encountered: