You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
We have observed that when a connectivity problem causes consumer groups to change Vector will randomly stop consuming from the associated topic and consumer lag will build up in Kafka.
rdkafka has various issues introduced in version 0.34.0 -> 0.36.0
Vector specifically uses the StreamConsumer to consume from Kafka via the source component which is indicated as having an issue.
Inside of these issues the report that v0.35.0 should be okay, but from this PR fede1024/rust-rdkafka#666 which fixes the issue it would still be present v0.35.0 for the StreamConsumer as seen here.
Configuration
No response
Version
0.40.0
Debug Output
No response
Example Data
No response
Additional Context
We'd like to propose to upgrade rdkafka to version 0.37.0 to fix the issue as we've identified that the fix for StreamConsumer is in this release.
Until then, we have to monitor kafka and restart Vector when this happens. It also of note that due to #21134 we can't just monitor Vector as the metrics that indicate something is wrong are currently incorrect.
References
No response
The text was updated successfully, but these errors were encountered:
So I think #21134 is actually caused by the StreamConsumer race. I took a look at where the metrics are in Vector and its just a callback into the rust rdkafka library. It would make sense if the consumer thread goes idle for a particular partition that it doesn't update its lag metric.
ADustyOldMuffin
changed the title
rdkafka v0.35.0 has a StreamConsumer race condition that causes a deadlock on consumer group changes
On kafka consumer rebalance, Vector consumer stops consuming.
Dec 28, 2024
A note for the community
Problem
We have observed that when a connectivity problem causes consumer groups to change Vector will randomly stop consuming from the associated topic and consumer lag will build up in Kafka.
rdkafka has various issues introduced in version 0.34.0 -> 0.36.0
Vector specifically uses the
StreamConsumer
to consume from Kafka via the source component which is indicated as having an issue.Inside of these issues the report that v0.35.0 should be okay, but from this PR fede1024/rust-rdkafka#666 which fixes the issue it would still be present v0.35.0 for the
StreamConsumer
as seen here.Configuration
No response
Version
0.40.0
Debug Output
No response
Example Data
No response
Additional Context
We'd like to propose to upgrade rdkafka to version 0.37.0 to fix the issue as we've identified that the fix for
StreamConsumer
is in this release.Until then, we have to monitor kafka and restart Vector when this happens. It also of note that due to #21134 we can't just monitor Vector as the metrics that indicate something is wrong are currently incorrect.
References
No response
The text was updated successfully, but these errors were encountered: