-
Notifications
You must be signed in to change notification settings - Fork 29
Canary reports bouts of "The provided member is not known in the current generation" when consuming messages leading to incorrect latency numbers #161
Comments
Can you enable the Sarama logging and provide an updated canary log again. Taking a look at Sarama I see this error when joining a consumer group that seems to happen frequently (rebalance?). Sarama logging could provide us more hints about the underlying problem. https://github.com/Shopify/sarama/blob/main/consumer_group.go#L253 |
@ppatierno I don't have a reproducer for this issue at the moment, any idea how we might induce a state like this? I imagine that the service side logs might be informative too. |
I don't know if this helps (still looking), but found this comment IBM/sarama#1866 which suggests tuning timeouts might help. The thread looks potentially interesting too: IBM/sarama#2118 |
I want to withdraw this comment. |
To move forward on this I wonder about adding the ability to control logging (including sarama logging) dynamically, so that logging can be enabled easily when the issue is seen |
@ppatierno @tombentley asked elsewhere why the canary uses a consumergroup at all? the canary's role is just to measure message latency. It should use the simplest way to achieve that goal. Are there good reasons to use a consumergroup for the canary? |
I can't think of any off the top of my head. |
I used a consumer group as I was used to do with Java clients but I don't see any specific reason why we couldn't switch to not using it. Unless we don't see the possibility to scale the Canary application for having more consumers but it's really not our case for the purpose of the Canary itself. |
We are using strimzi-canary 0.2.0, occasionally we are seeing extended periods where the canary reports
The provided member is not known in the current generation
. During this time it appears the end to end message latency observed by the canary become extended. We also see a spikes in thestrimzi_canary_consumer_error_total metric
which correlates with the appearance of the messages.Log from a canary instance that experienced several lengthy bouts of the problem.
canary.log
The text was updated successfully, but these errors were encountered: