-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
messages sit in queue until GKE pod with subscriber gets reset #11
Comments
From @callmehiphop on October 1, 2017 15:50 We've seen a number of reports of messages not being delivered in k8s, I believe this issue is being investigated internally, although I do not know the status. @lukesneeringer have we heard any news in regards to this? |
From @eyalse on October 1, 2017 22:33 I'm suffering from the same issue at the moment :( |
From @ApeNox on October 2, 2017 8:5 Suffering the same issue too, please provide a fix as these are production used tools. |
From @eyalse on October 2, 2017 9:26 @callmehiphop (@lukesneeringer) hey any update? as mentioned these tools (k8s and pubsub) are used in production. |
From @callmehiphop on October 2, 2017 14:54 I don't have any official updates, but a new patch release was made this morning that might resolve the issues you're seeing. |
From @ShahNewazKhan on October 2, 2017 20:25 @callmehiphop I have done some preliminary testing with the google-cloud/pubsub patch version: 0.14.3 release this morning and it looks promising so far I have not been able to reproduce the issue yet however will need to run full end to end tests to confirm |
From @callmehiphop on October 2, 2017 20:34 @ShahNewazKhan that's great, please keep us posted! 😃 |
From @ShahNewazKhan on October 3, 2017 1:42 @callmehiphop I have been able to replicate the issue with google-cloud/pubsub patch 0.14.3 in a slightly different use case. Environment details
Steps to reproduce
At this point the message remains stuck in the pubsub queue until I reset the GKE pod 2 [pubsub subscriber app] |
From @ShahNewazKhan on October 10, 2017 21:41 Just checking in for updates on this issue. |
From @callmehiphop on October 11, 2017 18:27 @ShahNewazKhan We believe this is a GKE issue and because of that I can't comment on if its being worked on and when it will be fixed. I'm really sorry for the inconvenience. |
From @ehacke on October 16, 2017 1:25 We may be having similar issues, not sure. @ShahNewazKhan what version of GKE are you on? |
From @ShahNewazKhan on October 16, 2017 2:10 GKE: 1.6.10-gke.1 |
From @kir-titievsky on October 26, 2017 18:23 Question for those who'd reported this: is there any chance you had no messages published or delivered for 10 minutes or longer before you started publishing and accumulating them in the backlog? |
From @ShahNewazKhan on October 26, 2017 19:2 @kir-titievsky I can confirm that the published messages sit in the subscription queue only when the publisher has been inactive longer than 10 minutes. |
From @kir-titievsky on October 26, 2017 19:47 Thanks @ShahNewazKhan . My guess here is this: by default, GCE suspends inactive connections after 10 minutes [1]. Since Pub/Sub relies on a persistent streamingPull connection, this connection would get suspended if no messages flow for 10 minutes. This condition was not properly detected by Pub/Sub. This was fixed as of 2017-10-20 by shutting down affected streamingPull connections. The server-initiated shutdown should now trigger the client library to rebuild the connection. Can those of you affected check if the issue persists? [1] https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet |
From @ShahNewazKhan on October 31, 2017 22:56 @kir-titievsky Can you clarify what you mean by 'server-initiated shutdown'. Does this mean that the inactive Pub/Sub streamingPull connections are now being shutdown instead of being suspended by GCE? I have noticed messages sitting in the queue intermittently still, do I have to update the Pub/Sub client to a latest version to handle the streamingPull connection rebuilds? Thanks in advance! |
I'm marking this as |
From @callmehiphop on November 27, 2017 15:37 @stephenplusplus I believe it does! |
@ShahNewazKhan: Pub/Sub servers now close streamingPull connections regularly, with a timeout shorter than GCE's 10 minute limit. This allow the client library to quickly rebuilt the connections making sure that none are stuck in a suspended state. |
Please let us know if there are still any issues. For now, this sounds like it's resolved. |
Hello, I continue to encounter this issue regularly. As you my pod stops to receive any message after an inactive period, as soon as I restart the pod all messages are well delivered to this new pod instance. Do I need to update anything to get the fix explained by @kir-titievsky? How can I investigate about this issue? Thanks! Environment details:
|
We have experienced the same issue in the past few weeks with new deployments. Older deployments seem to work fine. A bit annoying to restart the pods daily. Environment details:
|
I think this was caused by grpc 1.8.4. I added grpc 1.7.3 as dependency and so far everything seems to be working fine. |
Related: googleapis/google-cloud-python#4737 |
I have also solved this issue by adding grpc 1.7.3 as a dependency. |
As per googleapis/nodejs-pubsub#11 (comment), the timeout problems we are having on the reporting pipeline can be solved by fixing grpc to 1.7.3. We are trying this out to see if it fixes it on our case.
Hi, any news on this issue ? Because I encountered the same problem. I move from Google compute engine to Kubernetes. I have 2 pubsub topics. One is used frequently (eg: many message pushed) and there is no problem. And another is used less frequently (message pushed every 30 minutes), after few minutes, pubsub stops receiving messages. |
@alexandreawe what version of the PubSub client are you experiencing these issues with? |
@callmehiphop i'm using v0.18.0. I have fixed this issue by looping every 15 minutes and restarting the subscription |
I'm also seeing this behavior with the latest version of google pubusb as of this comment - 0.22.2. We have two apps running in GKE communicating with each other via pubsub and the subscribers just stop receiving messages until the pods are restarted. At this point I guess I'm looking at looping and restarting the subscription every 15 minutes as described above but this feels very hacky. |
I'd like to chime in and point out that we're experiencing a very similar issue in our on-prem service, where streaming pull connections stop "restarting every 10 min" after a few times (typically in about 30 min). That seems to be correlated with modAcks/acks not working as expected (i.e. all acks are indicated as "expired" after streaming pulls stop). This is described in #314 (comment) Additionally, we've experienced a very similar issue with the Java client on GKE a few months ago, and it was resolved as a "server-side" problem on behalf of Google. Is there a chance we're seeing it pop up again here? EDIT: It appears that only versions 0.23.0 and up are affected. With 0.22.2, only a fraction of acks turns up "expired", and streaming pulls don't stop. |
@plamut, this is a similar problem to the one you're seeing on Python. |
@alexandreawe @barrettc Are you doing something straightforward like // assuming topic
let topic = pubsub.topic('dogs');
let subscription = subscriptionTopic.subscription('my-subscription');
setInterval(async () => {
await subscription.close();
subscription.open();
}, 15 * 60 * 1000); I would love to hear about your experiences ~six months later and if this has been reliable enough. Thanks! |
@camsjams For our particular issue, it turns out we had a code path in which messages were not being properly acknowledged and the buildup of those messages created the perception that the subscriber had stopped working. In other words, we had |
Thank you for your response. In our code we have a thin wrapper mediating PubSub activity, so it "always" acknowledges. |
From @ShahNewazKhan on October 1, 2017 9:3
Environment details
Steps to reproduce
I am facing an intermittent issue where pubsub messages are sitting in the queue and not being delivered to the subscriber in GKE pod 2. Only when I delete the GKE pod 2 subscriber and restart the pod does the message get delivered.
Copied from original issue: googleapis/google-cloud-node#2640
The text was updated successfully, but these errors were encountered: