Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azservicebus] ASB client stopped receiving messages for its subscription. #17408

Closed
VirajSalaka opened this issue Mar 30, 2022 · 7 comments
Closed
Assignees
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@VirajSalaka
Copy link

Bug Report
pkg: github.com/Azure/azure-sdk-for-go/sdk/messaging/azservicebus

SDK version: v0.3.6

go version: go 1.16.13

What happened?

One of the ASB clients does not receive any messages for one of its subscriptions.

In the our environment, there are separate instances running where each instance subscribes to a set of topics. And for a given topic each instance creates its own subscription. Hence the subscription remains unique for each instance. In our environment, we noticed that one ASB client failed to retrieve messages for the topic called notification, but the other instances were able to. In the meantime, we observed that this issue was not there for other topics which it subscribed. [1]

Error:
No specific Error message

What did you expect or want to happen?

If there are events for the topic queued under a given subscription, we expect to receive all those events until our application is killed/exited.

Anything we should know about your environment.
Our service is running on AKS

Logs:
query_data_asb_issue_3.csv

[1] https://github.com/wso2/product-microgateway/blob/ddc47b9c07397d44d50659e0fb1248a1fee76656/adapter/pkg/messaging/azure_listener.go#L70

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 30, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Mar 30, 2022
@richardpark-msft
Copy link
Member

Just wanted to confirm, are you sure this log comes from a machine running 0.3.6?

We had a bug where we were retrying when a message lock was lost in 0.3.6 and your log looks like it doesn't have that bug.

Is it possible this log is coming from an earlier version of the package? There were some substantial improvements in 0.3.6's reliability (ignoring the bug) and even more are coming in the release next week (0.3.7).

@richardpark-msft richardpark-msft added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Mar 30, 2022
@VirajSalaka
Copy link
Author

Thanks for the reply. I have checked again and it is 0.3.6 tag we were using at that period.

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Mar 31, 2022
@hilariocoelho
Copy link

Faced the same issue, also using 0.3.6

I'm now updating to 0.4.0, hope it fixes

@richardpark-msft
Copy link
Member

If either of you is able to capture this failure in a log that would help. We did fix this particular issue in 0.4.0:

"Fixed issue where a message lock expiring would cause unnecessary retries. These retries could cause message settlement calls (ex: Receiver.CompleteMessage) to appear to hang. (#17382)"

Which could explain the reason why it seems to hang. What you'd see in the log (if this were the case) is several retries, with a 410 error being the underlying cause.

Now, with that fix in the way this does happen because of a lock expiration, which could indicate that you need to call receiver.RenewLock() if you plan on processing longer than your configured lock period on your queue/subscription.

@hilariocoelho
Copy link

Hello @richardpark-msft

I have a service running for 4 days and it is still consuming properly with 0.4.0. I think it fixed

@richardpark-msft
Copy link
Member

Hi all, I believe we're at the point where we've investigated and fixed the original issue. I'm closing this, but we can re-open in the future if this comes back.

@VirajSalaka
Copy link
Author

Hi @richardpark-msft ,

we are already on to SDK version: v0.4.0, golang version 1.18.1, and we observed the same issue a couple of times recently. This time we had the sdk debug logs enabled too.

Let me explain the nature of this issue. From our application logic what we does is we call the receiveMessages method and we iterate each message and add to a go channel (with no buffer). And once go channel's item is consumed, the next message is added. In this specific scenario, the consumer of this channel got paused for like 10 - 15 minutes due to some database connectivity issue. Therefore, the completeMessage method is called for the received message also got delayed with the same duration. Thereafter, no message is received by ASB client.

In the following log, the pause happened for the message under the ID : 9eaf9907-b530-4391-92da-560fb079c2d5:9100:1:1-1. This was received at 9.25.00 and completed at 9.40.44. That was the last event received.

asb_issue_23_06.csv

@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

4 participants