Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TopicClient.sendAsync timeout after 2 minutes even if default operation timeout is 1 minute #8398

Closed
krishnasingh1818 opened this issue Oct 24, 2019 · 9 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Service Bus

Comments

@krishnasingh1818
Copy link

krishnasingh1818 commented Oct 24, 2019

Describe the bug
We are observing every day we have random 2 to 4 service bus send calls taking more that a minute to complete. some of them are successfully and some of them fail with timeout error. Interesting part is onces that fail with timeout error seem to fail after 2 minutes even though default timeout is 1 minute

Exception or Stack Trace
The operation did not complete within the allocated time 00:00:59.9999667 for object message.Reference: 9542a42a4ad14bce933bcadaffcc9fb1_G26, 08/21/2019 00:56:59 The operation did not complete within the allocated time 00:00:59.9999667 for object message.
Microsoft.Azure.ServiceBus.ServiceBusTimeoutException:
at Microsoft.Azure.ServiceBus.Core.MessageSender+d__53.MoveNext (Microsoft.Azure.ServiceBus, Version=3.0.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.RetryPolicy+d__18.MoveNext (Microsoft.Azure.ServiceBus, Version=3.0.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.RetryPolicy+d__18.MoveNext (Microsoft.Azure.ServiceBus, Version=3.0.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageSender+d__40.MoveNext (Microsoft.Azure.ServiceBus, Version=3.0.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Company.Common.Messaging.TopicService+<>c__DisplayClass12_01+<<SendAsync>b__2>d.MoveNext (Company.Common.Messaging, Version=6.0.3.0, Culture=neutral, PublicKeyToken=null) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Polly.FallbackSyntaxAsync+<>c__DisplayClass3_1+<<FallbackAsync>b__1>d.MoveNext (Polly, Version=6.0.0.0, Culture=neutral, PublicKeyToken=c8a3ffc3f8f825cc) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Polly.Fallback.FallbackEngine+<ImplementationAsync>d__11.MoveNext (Polly, Version=6.0.0.0, Culture=neutral, PublicKeyToken=c8a3ffc3f8f825cc)
Inner exception System.TimeoutException handled at Microsoft.Azure.ServiceBus.Core.MessageSender+d__53.MoveNext:
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.Amqp.AsyncResult.End (Microsoft.Azure.Amqp, Version=2.3.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.Azure.Amqp.SendingAmqpLink.EndSendMessage (Microsoft.Azure.Amqp, Version=2.3.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageSender+d__53.MoveNext (Microsoft.Azure.ServiceBus, Version=3.0.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)

To Reproduce
Steps to reproduce the behavior:

Code Snippet
using basic TopicClient.SendAsync() to send messages with defaults.

Expected behavior
sendAsync should respect default operationTimeout of 1 minute and it should timeout exactly at one minute instead of 2 minutes.

Screenshots
image

Setup (please complete the following information):
we have common nuget messaging package which is used from code in 1, 2 below and in both cases we see this issue.

  1. api hosted in aks 1.14.6 (microsoft/dotnet:2.1.5-aspnetcore-runtime)
  2. azure functions consumption plan
  • "Microsoft.Azure.ServiceBus" Version="3.0.2"

Additional context

  1. Mostly calls originating from aks pod to azure service bus are for sure failing if time elapsed is above 120 seconds with timeout error. There are some calls taking upto 120 seconds but still passing.
  2. Calls originating from azure functions are max taking 63 to 64 seconds and passing (why did they not timeout since default timeout is 1 minute ?)
@triage-new-issues triage-new-issues bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Oct 24, 2019
@krishnasingh1818 krishnasingh1818 changed the title [BUG] sendAsync timeout after 2 minutes even if default operation timeout is 1 minute [BUG] TopicClient.sendAsync timeout after 2 minutes even if default operation timeout is 1 minute Oct 24, 2019
@loarabia loarabia added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Service Bus labels Oct 24, 2019
@triage-new-issues triage-new-issues bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Oct 24, 2019
@ghost
Copy link

ghost commented Oct 24, 2019

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl

1 similar comment
@ghost
Copy link

ghost commented Oct 24, 2019

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl

@jfggdl jfggdl assigned jfggdl, nemakam, binzywu and axisc and unassigned jfggdl Oct 25, 2019
@jfggdl
Copy link

jfggdl commented Oct 25, 2019

@krishnasingh1818 , Thanks for creating this issue. We are trying to reproduce your problem. Would you please provide the code snippet or minimal Docker image (seems that you are using AKS and you may be using Docker) that does not include any sensitive information.

@jfggdl jfggdl added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Oct 25, 2019
@nemakam
Copy link
Contributor

nemakam commented Oct 25, 2019

@krishnasingh1818, We have recently fixed couple of issues regarding few operations taking more time than the provided timeout. The client side fixes are present in Microsoft.Azure.ServiceBus version 4.0.0 - https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/servicebus/Microsoft.Azure.ServiceBus/changelog.md.
The service has also been fixed with few of these issues and will be deployed soon. If the problem persists after a month, we could investigate further on that.

@nemakam
Copy link
Contributor

nemakam commented Oct 25, 2019

Could you also confirm if this is happening outside the scope of AKS?

@krishnasingh1818
Copy link
Author

@krishnasingh1818, We have recently fixed couple of issues regarding few operations taking more time than the provided timeout. The client side fixes are present in Microsoft.Azure.ServiceBus version 4.0.0 - https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/servicebus/Microsoft.Azure.ServiceBus/changelog.md.
The service has also been fixed with few of these issues and will be deployed soon. If the problem persists after a month, we could investigate further on that.

am trying this fix and will update with results.

@krishnasingh1818
Copy link
Author

krishnasingh1818 commented Nov 5, 2019

I updated all our microservices to use new Microsoft.Azure.ServiceBus version 4.1.0 but am seeing lot of System.OperationCanceledException being generated in application insight logs. Am noting that in most cases they not impacting transactions but its filling up app insight logs with exceptions noise.

also i ran query in appinsight for dependencies to service bus and am not seeing durations of minutes or 2 minutes to complete service bus calls. so version upgrade might have fixed the long duration issue but now generating lot of exceptions in logs as mentioned above. I see pr created to fix OperationCanceledException in 4.1.1 . will this fix this issue. - > #8449. please review and advise.

System.OperationCanceledException: at System.Threading.CancellationToken.ThrowOperationCanceledException (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__33.MoveNext (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.Azure.ServiceBus.MessageReceivePump+<MessagePumpTaskAsync>d__11.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)

@nemakam
Copy link
Contributor

nemakam commented Nov 9, 2019

@krishnasingh1818 yes, #8449 should fix OperationCanceledException which should be released in 4.1.1 version

@nemakam nemakam closed this as completed Nov 9, 2019
@ghost
Copy link

ghost commented Nov 9, 2019

Thanks for working with Microsoft on GitHub! Tell us how you feel about your experience using the reactions on this comment.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Service Bus
Projects
None yet
Development

No branches or pull requests

6 participants