Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes segfault on SIGINT #14

Closed
wants to merge 1 commit into from
Closed

Fixes segfault on SIGINT #14

wants to merge 1 commit into from

Conversation

attdona
Copy link

@attdona attdona commented May 17, 2021

  • Removed all finalizer call because for what I know it is hard to say in what orders finalizer get called
    and this may cause problems with the correct order of api call expected by librdkafka.

  • added kafka_consumer_close

  • removed timer kafka_poll from KafkaClient

@dfdx
Copy link
Owner

dfdx commented May 17, 2021

Could you please describe what kind of issues you got with the kafka_poll timer? I may have misused the Timer type, but I think we still need to call kafka_poll at regular intervals to make callbacks and other stuff work correctly (e.g. see #13).

@attdona
Copy link
Author

attdona commented May 18, 2021

If you call rd_kafka_destroy, both via a finalizer or with an explicit close method, than an eventual timer callback will cause a segfault: the call to rd_kafka_destroy destructor has terminated the kafka threads when the timer triggers the callback.

I think this is the reason why a segfault is not always reproducible: it depends if the timer triggers the callback after client termination.

I don't see any reason to call a poll in a background task, are you sure it is needed?

@dfdx
Copy link
Owner

dfdx commented May 18, 2021

According to rdkafka.h (which is one of the primary sources of documentation btw):

@remark An application should make sure to call poll() at regular
intervals to serve any queued callbacks waiting to be called.
@remark If your producer doesn't have any callback set (in particular
via rd_kafka_conf_set_dr_msg_cb or rd_kafka_conf_set_error_cb)
you might chose not to call poll(), though this is not
recommended.

Similar to what's proposed in #13, I think we can add a flag .polling to the KafkaClient and use it to stop polling just before calling rd_kafka_destroy() in Base.close(::KafkaClient). Do you think it will be sufficient to avoid the conflicts?

@kschzt
Copy link
Contributor

kschzt commented May 18, 2021

Indeed calling rd_kafka_poll() is required for delivery reports and error callbacks. If it isn't called, the application won't receive errors from librdkafka and can end up in a dead state. As a Producer, there are no alternatives to rd_kafka_poll(), as a Consumer you can get away with only rd_kafka_consumer_poll() if you call rd_kafka_poll_set_consumer() to redirect the callbacks to it. I've chosen to just call rd_kafka_poll() in both cases in #13. HTH

@attdona
Copy link
Author

attdona commented May 18, 2021

Nice, in #13 rd_kafka_poll is protected and it will no called after kafka destructor, but it is called inside a thread: If SIGINT is delivered to that thread this prevents a clean shutdown of the application.

Should it work if InterruptException is catch and rethrow?

Also a Timer may cause problems with SIGINT, see here.

@dfdx
Copy link
Owner

dfdx commented May 18, 2021

I've just merged #13 with polling in the @async-created task (i.e. in the same thread). If I'm not missing anything, rebasing this PR on master now should be sufficient to to avoid segfault. @attdona Do you see any other issues?

@attdona
Copy link
Author

attdona commented May 19, 2021

Thanks! Now the segfault issue is resolved. I've posted a note in #13 regardind the use of an @async task inside the library that is a possible source of the fatal: error thrown and no exception handler available uncatchable exception.

@dfdx dfdx deleted the branch dfdx:master October 9, 2024 22:52
@dfdx dfdx closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants