-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishing sometimes fails when ordering key is enabled #1084
Comments
Hi @mukund-ananthu, can you give me details on how your fix will help on this issue? Thanks :) |
Hi @pamarc , Sure. For publishing with ordering keys: Prior to the fix, the retry deadline was set to infinite, however default_timeout was still set to default of 60s. This As a result, for retryable errors, there would be an initial set of retries where the client library makes requests to the backend within this 60s time frame. If the retries have not succeeded within this timeframe, then the publish call times out due to With this fix, instead of timing out after 60s and not making further calls to the backend after that, the client library continues to make calls to the backend infinitely. When the transient issue in the backend is resolved, the retry from the client library should eventually succeed. The issue from the client library perspective was that it was not making the required retries to the backend post 60s. |
Also, w.r.t If you do want to write application logic to handle such paused states of the library in case of non-retriable error codes, when / where / whether the resume_publish should be done would depend on how you would want to respond to the particular non-retriable error code and it may not always be apt to do it in an async manner with callbacks. For example, if you receive a not found error non-retryable error code, as a result of say, publishing to a non-existent topic, then it may be more appropriate to call resume_publish only after the cause of the non-retriable error is resolved(topic is created). |
@mukund-ananthu thanks a lot for this detailed explanation, this is very clear. |
Environment details
Description
Hello there,
We’ve encountered some publishing issues for a while now and ran out of ideas on how to fix them. The behaviour is the following. We publish messages into a few topics that are configured with an ordering key. Multiple times a week (or even a day), the publishing process fails and the client publisher stops (it seems to be stuck?), as no messages are published. On Google Cloud Monitoring, it usually states a ‘deadline exceeded’ and indeed, we do have timeouts when that happens (but no exception on the client side). This first observation is a bit weird though, given that we don’t publish massive loads of messages (just a few messages every second).
After (!) reading the documentation, this behaviour is supposed to be expected with ordering keys.
Code example
We wanted to avoid calling
result()
in the main thread as it is blocking, but use a callback instead. Anyway, this does not work, as we still have the same issues.Can anyone help?
The text was updated successfully, but these errors were encountered: