-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ASB consumer stops with "ReactorDispatcher instance is closed", only recovers after restart #25085
Comments
The issue happened again for 2 applications. One thing I noticed about all the problems since the library upgrade from 7.2.0 to 7.4.1 is that only applications, which connect to more than 1 servicebus resource, seem to be affected, but maybe it is coincidence Affected applications:
2 other services, which just read from 1 ASB resource, are not affected by this problem so far. |
Thanks for reporting this @p4p4. @anuchandy can you please follow up? /cc @ki1729 |
According to Azure Support (TrackingID#2110280050001032) there was an OS update of the servicebus at the time of the incidents, which did also lead to server errors, possibly this was the root cause of the problem, and the client from the sdk did not recover from this. The support referred to
|
Hi Patrick, thank you for the extra context on the service upgrade. Sharing the SDK DEBUG level logging during the time frame (around ~20 mins before and after the incident) will help us to understand activities in the link and what could lead the receiver to stop. |
Hi Anu, all logs >= level INFO unfortunately DEBUG logging was not active for the |
Hi. My customer is experiencing the same issue. any update on this? |
@p4p4 Unfortunately, there is not enough info in the INFO log to confirm the reason for this to happen. The only thing I could find is, there is a graceful closure of endpoint (without error), the only known (fixed) issue we could map it to this one https://github.com/Azure/azure-sdk-for-java/blob/azure-core-amqp_2.3.3/sdk/core/azure-core-amqp/CHANGELOG.md#bugs-fixed, which is addressed in SB 7.4.2, but you're on 7.4.1. Again I'm unable to confirm this since the error pattern is available only in DEBUG level. @jiyongseong the "ReactorDispatcher instance is closed" is misleading (I think we suppressed it in the recent version because it isn't actually the cause of errors, I need to check). If you have any DEBUG logs we can see what happened before this exception. |
gaspicmsapp.log.zip |
@p4p4 multiple reliability issues were fixed since 7.4.1, including fixes in the recovery route that SDK goes through during service upgrades. Closing this ticket for now, please try out the latest 7.7.x, and if you still run into an issue, please reopen (with DEBUG logs). |
Hi @anuchandy, unfortunately we are still seeing the error containing the following information with version 7.7.0:
We also have opened a ticket, and supplied the debug log details with TrackingID#2202140050000661 but it seems that the details did not reach the right team, or got stuck somewhere... Thank you for all your efforts and enhancements in the stability area. Best regards, |
Hi @hargut, Is this an intermittent error that does not stop the receiver, OR is it leaving the receiver in a state where it no longer produces messages? I think what is happening here is - the message processing took some time, and by the time application is ready to complete, the original link that the message delivered no longer exists (e.g., a transient error), and the library created/ is creating a new link to continue to receive. It is impossible to complete a message on a link different from the one received. Here is a related comment #26761 (comment) |
I double confirmed with the service bus team - Currently, completing a message relies on the lock token; each amqp-link objects track these tokens associated with the messages it produces. So by service design, once the amqp-link is closed (due to transient error, timeouts, etc..), those records do not exist anymore, rejecting completion of the related messages. As usual, uncompleted messages will be redelivered (as long as delivery_count <= max_delivery_count). The error message you saw above ("Cannot update disposition with no link") is a client-side error because the client identified that the link no longer exists. The general recommendation is - an application taking more than a couple of minutes should be designed to handle any re-delivery of the messages. |
Hi @anuchandy, thank you very much for your clarifications. As the stacktrace shows that this is related to the .complete() call on the ServiceBusReceiverAsyncClient we will change the flow to continue on this specific error, and allow for a re-delivery of the message. Have a great time. Best regards, |
Hi @hargut Do you have any progress on this issue? We are facing the same problem. |
Describe the bug
The behaviour looks quite similar to this issue, although we are using recent library versions:
Azure/azure-service-bus-java#335
Multiple (independent) applications, reading from different topics/queues stopped consuming messages from the servicebus roughly in the same time span. (see increase of active messages)
we recently upgraded to 'azure-messaging-servicebus' version: '7.4.1' and did not see this behaviour before, but shortly after upgrading it happened with 2 different servicebus namespaces.
Exception or Stack Trace
the logs of "orderdataservice" were showing the following errors:
the logs of "userdataservice" look similar:
To Reproduce
Code Snippet
Consumer is created as a spring bean as follows and not closed anywhere
Expected behavior
Consumers recover by themselves again after network issues and don't get closed.
(Note that our code nowhere explicitly closes processorclients)
Setup (please complete the following information):
Additional context
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
Please let me know if I can provide more information to you.
The text was updated successfully, but these errors were encountered: