-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeing occasional segfaults with latest release #4195
Comments
@wmorgan6796 Thanks for the report! rd_kafka_assert(NULL,
thrd_is_current(rktp->rktp_rkt->rkt_rk->rk_thread)); It's because |
@emasab whats the criticality of this bug? If the fix isn't already in progress, would like to take a crack at it so we can fix the seg faults occuring in our environments |
@wmorgan6796 how often does it occur? Could you get the a debug log with "debug=consumer,cgrp,topic,fetch"? I have clear how to fix it, but it's difficult to create a test that fails this way so it could be helpful to have the sequence of events. The scenario is that the follower lease expires and the consumer migrates back to the leader, this happens while it was doing a ListOffsets to get the earliest or latest offset, so it calls |
Could reproduce it using a seek to latest while fetching from follower, while manually reducing |
lease expires and the partition is waiting for a list offsets result closes #4195
lease expires and the partition is waiting for a list offsets result closes #4195
Hey @emasab, I'm unsure what logs I can provide as per my company's policy sharing internal logs is quite difficult as they have to be cleansed by hand of any identifying information. I'm more than happy to test a build, but I'm not sure I can do more than that at this point. (since the implementation of the fix is already ready for review) |
lease expires and the partition is waiting for a list offsets result closes #4195
and the partition is waiting for a list offsets result closes #4195
Description
We've recently done a large scale rollout of librdkafka 2.0.2 (1000+ nodes) and we're seeing occasional segmentation faults on the library. We've noticed the segmentation faults seem to occur quite a bit more when there's heavy rebalances going on in the cluster for a consumer group.
Some important notes:
Our services are 100% modern C++17 compiled statically using GCC 12.2 on Linux Ubuntu Focal.
Stack trace here:
How to reproduce
I have not been able to reliable reproduce the issue locally and therefore don't have much more to go on then what is above from the segfault. I'm not sure if there is something wrong with the library or if its a smoking gun elsewhere.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
<REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
debug=..
as necessary) from librdkafkaGlobal Consumer Config
Topic Consumer Config
Producer Config
The text was updated successfully, but these errors were encountered: