-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pub/Sub Subscriber does not catch & retry UNAVAILABLE errors #4234
Comments
Rengerating pubsub to use api_core should solve this, I think. |
How would I do that? |
We're working on it. :) |
We're able to reproduce this consistently within 1-2 minutes of starting a snippet virtually identical to the one on https://cloud.google.com/pubsub/docs/pull#pubsub-pull-messages-async-python. There are no messages published on the topic. Our stacktrace is a bit different, though:
|
Any update about this issue? |
2 similar comments
Any update about this issue? |
Any update about this issue? |
@kir-titievsky, the solution you mentioned seems to work at first, but the problem with this approach seems to be that the CPU usage goes up every time the UNAVAILABLE exception is handled this way. Thus, a fix seems to be necessary. |
Hey folks, Hoping to have a fix today or tomorrow. |
Any update on this issue? |
Commented out exception raising code. But events are not consistent. I currently have subscriptions for insert/delete/preempt VM operations, But I consistently miss out preempt events. My edit was to ignore self._policy.on_exception(exc) in _blocking_consume method. |
Sorry, I forgot to tag this issue. #4265 fixed this, and has been merged. A new version will be cut soon (within a day or two). |
Wait, sorry, I got confused. I still need to do the Re-opening, and a fix will be in soon. |
How is this going now Luke @lukesneeringer ? |
I am also hitting this issue using the code samples. |
Do we have any workaround? my consumer gets stuck completely and does not even recover after calling open again. |
One workaround that seemed to work for me was to generate a publisher client that sends a dummy pubsub message once every, e.g., 10 seconds to the subscriber client just to prevent it from throwing the exception. Not sure if that would work for everyone though... |
@sachin-shetty @murataksoy I tried solution with dummy messages repeating in 30 seconds, but got crash in 2 days. |
@makrusak thanks for the feedback, I should also watch out for this I guess. One workaround I considered (but never implemented) was to stop / start the client itself, again every ~10 seconds. From what I understand, a client does not have to be listening at the time a pubsub message is sent, it can receive the message afterwards if it can start listening before a certain timeout is reached after the message is sent. Could that be an alternative solution? |
@murataksoy It can be a workaround. But all these approaches are too fragile. I can't imagine how to use them in a production environment and not to loose sleep :) |
I opened a ticket a few weeks ago with a question should I upgrade to the latest version or wait until all these issues will be resolved. Despite I was recommended to stay on the latest version, amount of issues and workarounds I have to put in my code drive me nuts. So in prod I still stay on a pretty old version which is 0.24.0. |
@makrusak I totally agree, a fix is necessary. These are just some workarounds that some people might be willing to consider depending on their application until this bug is fixed. |
Hi, Any updates or ETAs here? |
@frankcarey You can follow along on #4444 for now. I am working actively to squash this and related bugs. |
- Adding special handling for API core exceptions - Retrying on both types of idempotent error - Also doing a "drive-by" hygiene fix changing a global from `logger` to `_LOGGER` Towards googleapis#4234.
- Adding special handling for API core exceptions - Retrying on both types of idempotent error - Also doing a "drive-by" hygiene fix changing a global from `logger` to `_LOGGER` Towards googleapis#4234.
- Adding special handling for API core exceptions - Retrying on both types of idempotent error Towards #4234.
Fixed by #4444. I ran a reproducible test case against the current master for 659 seconds and it did not fail with
|
FYI for all these following this issue, I have pushed a release ( I don't think all issues have been resolved, but this at least will "gracefully" handle inactivity. (The |
Why this issue is closed? |
@makrusak This issue was resolved because the implementation now correctly handles There are still a few very important issues to be tackled, though #3965 seems closest to what you're describing and isn't currently marked as "p1" (but it probably should be). A few questions for you:
|
@dhermes Yes, I installed and deployed 29.1 few minutes after this your comment. |
Which guide are you referring to? (I'm not on the team that writes the guides and I wasn't part of the original implementation of Pub / Sub, so sorry for my ignorance.) |
@dhermes My logs look like: https://pastebin.com/Dn9Lnynb (and so on and so on). As you can see initial request is sending repeatedly each 1.5 minutes. while True:
subscription = subscriber.subscribe(subscription_path, callback=run_combine_process)
subscription.future.result() |
@makrusak I just cut another release https://pypi.org/project/google-cloud-pubsub/0.29.2/. This will at least stop some of the threads that weren't previously being stopped. However, it won't close the bidirectional consumer(s) on failure. You can do that by calling |
@dhermes Thanks! I'll validate it today and hope that all critical bugs will be fixed soon. |
@makrusak I've definitely traced down at last one place where threads are leaked, but I can't get the CPU usage to spike much more than 3% after running for 300 seconds (which ends up recovering from |
I've updated from 0.28.4 to 0.29.2, and now I get the following error exactly 90 seconds after the client starts listening:
|
@itamar-resonai That line is from Maybe there is an issue with your install? |
Sorry, my bad. It was indeed 0.29.2, but I didn't disable the |
It seems like the error still persists with pubsub
I am seeing the traceback:
In case it matters, I'm running my code on GKE. |
Now, testing with
My relevant
|
Thanks, @brianbaquiran, let's track this over at #4234. |
@theacodes , the issue you linked to in the comment above is this self-same issue. :) Is there another issue you meant to link that discusses catching internal-thread pubsub exceptions? I haven't been able to find a satisfactory answer. Thanks. |
A basic Pub/Sub message consumer stops consuming messages after a retryable error (see stack trace below, but in short
_Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, The service was unable to fulfill your request. Please try again. [code=8a75])>
). The app does not crash but the stream never recovers and continue to receive messages. Interesting observations;Expected behavior:
This might be the same issue as 2683. This comment, in particular, seems like the solution that I would expect the client library to implement.
Answers to standard questions:
MacOS Sierra 10.12.6
python --version
Python 2.7.10 (running in virtualenv)
pip show google-cloud
,pip show google-<service>
orpip freeze
The text was updated successfully, but these errors were encountered: