-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ServiceBus] Fix an issue where user error handler is not called #19189
Conversation
and change `testError()` to only verify non empty `entityPath`, thus allow verifying error messages with different format.
and invoke user error handler in that case.
/azp run js - service-bus - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
Known browser test failure after removing "assert" dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I believe it was intentional to keep retrying in the streaming receiver even for non retryable errors as long as the user was notified of it. Please check with @richardpark-msft and @JoshLove-msft regarding past discussions on this. If so, then for #18798, we may want to specifcally look for MessagingEntityNotFound to break out of the loop. |
when subscribing to a valid topic but an invalid or disabled subscription. The problem is that in these two cases, the service accepts our client's cbs claim negotiation, and the receiver link is created. The [client receiver enters](https://github.com/Azure/azure-sdk-for-js/blob/14099039a8009d6d9687daf65d22a998e3ad7920/sdk/servicebus/service-bus/src/core/streamingReceiver.ts#L539) the `retryForeverFn` to call `_initAndAddCreditOperation()`, where the following ```ts await this._messageHandlers().postInitialize(); this._receiverHelper.addCredit(this.maxConcurrentCalls); ``` will return successfully, despite that the fired-and-forgot `addCredit()` call would later leads to `MessagingEntityDisabled` or `MessagingEntityNotFound` errors in the underlying link. Since there's no errors thrown in our retry-forever loop, the `onErrors()` [callback](https://github.com/Azure/azure-sdk-for-js/blob/14099039a8009d6d9687daf65d22a998e3ad7920/sdk/servicebus/service-bus/src/core/streamingReceiver.ts#L541) is not invoked. It is where the user error handler is called. Because of the error, the link is detatched. We have code to re-initializing the link when errors happen, so we call `_subscribeImpl()` where the `retryForeverFn()` is called again. This goes on in an endless loop. This PR adds code to invoke the user error handler in `_onAmqpError()` when the error code is `MessagingEntityDisabled` or `MessagingEntityNotFound` so users have a chance to stop the infinite loop. There's another problem. When users call `close()` on the subscription in their error handler, `this._receiverHelper.suspend()` is called to suspend the receiver. However, when re-connecting we call `this._receiverHelper.resume()` again in `_subscribeImpl()`. This essentially reset the receiver to be active and we will not abort the attempt to initialize connection https://github.com/Azure/azure-sdk-for-js/blob/14099039a8009d6d9687daf65d22a998e3ad7920/sdk/servicebus/service-bus/src/core/streamingReceiver.ts#L578 To fix it, this PR moves the `resume()` call out. It is supposed to only called to enable receiver before subscribing. It should not be called when we try to re-connect.
A different fix now. Please re-review
@@ -203,6 +203,17 @@ export class StreamingReceiver extends MessageReceiver { | |||
sbError, | |||
`${this.logPrefix} 'receiver_error' event occurred. The associated error is` | |||
); | |||
if ( | |||
sbError?.code && | |||
["MessagingEntityDisabled", "MessagingEntityNotFound"].includes(sbError.code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only targeting these two errors now to limit the impact.
/azp run js - service-bus - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
@ramya-rao-a @deyaaeldeen please have another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me based on the PR description but I would wait for @ramya-rao-a's approval. I left a couple nitpicky comments but feel free to ignore them.
I wonder if anything here is applicable to Event Hubs though.
The fix looks good to me, I only have 2 concerns
|
Event Hubs does not have a counterpart to Service Bus's topic-subscription model. So, I would imagine any such errors should occur when the receiver is created just like how it happens with Service Bus's queues. Event Hubs does have the concept of a consumer group and partitions though. So, you can try the combination of
|
I have tried a .NET test earlier and their user error handler was called for this error. Let me take another look at what .NET does differently. |
@ramya-rao-a sounds good! I created #19610. |
If I may add to what @ramya-rao-a suggests As a client I would like to be able to make informed decisions and close any connections/stop connection attempts gracefully. If this would require to expose internal errors (which also seems most flexible), then this would require good docs on what's what. Seeing what are those "worthy" errors it's obvious this option is the correct one, or at least consistent with how current samples are written. If however some internal errors are effectively equal, it would be nice to wrap them in higher level errors - just to hide the complexity. |
Thanks for the input @zoladkow! We will take it into consideration in the follow up discussions for this issue. Some extra context: I believe the general rule of thumb has been that any need of deeper control than automatic retries is made available by the other APIs on the receiver like the receiver.receiveMessages() or receiver.getMessageIterator. Both of these methods surface each and every error seen when receiving messages. The |
Wait, when you say "convenience" do you mean it's actually using those other API methods underneath? Isn't Or do you mean "convenience" but only that retries are handled by default? |
@zoladkow You are right. no, we are not calling |
I check with .NET and Python for this error
I think this PR is fixing a bug where this error is not reported to the user in JS because we have a different way of handling error then reconnecting in JS and no error is thrown in our retry code. To end users there are no two types of errors. |
yes @richardpark-msft also mentioned that there's no reason not doing this for all errors. I was a bit conservative but I can remove the limit on just these two error kinds. |
@jeremymeng In fact, the ASB resource which is being used in this case has IP restrictions applied, which is done by corporate policy, however the IP of ASB service is still public, which is why the browser can connect using my public outbound IP. In any case, that is the root cause, but still would be nice to have a chance to act upon such situation in the client code. Again I can see two situations:
Final AMQP messages from the above screenshot (copied as hex):
Please do let me know should I create a seprate issue. |
@zoladkow my current plan is to expose all errors via user error handlers. You might see more types of errors in error handler after my change, but it should cover the other error you were seeing too. |
instead of limiting to the two specific entity errors.
* This handler will be called for any error that occurs in the receiver when | ||
* - receiving the message, or | ||
* - executing your `processMessage` callback, or | ||
* - the receiver automatically completes or abandons the message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* - the receiver automatically completes or abandons the message. | |
* - receiver is completing the message on your behalf after successfully running your `processMessage` callback and `autoCompleteMessages` is enabled | |
* - receiver is abandoning the message on your behalf if running your `processMessage` callback fails and `autoCompleteMessages` is enabled | |
* - receiver is renewing the lock on your behalf due to auto lock renewal feature being enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add something to reflect that the receiver will automatically retry on all errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry behavior has not been changed in this PR though since the non-retryable classification was removed. We could add some clarification on subscribe()
ref doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the behavior around retry has not changed. When I see docs around the different error scenarios, my next thought is around what should I do about them. Now that we are clarifying the kind of errors, it would also help in clarifying that we do indeed retry on all the errors. And perhaps what the user can do is log them on their side if needed. And maybe point them to the ref docs on ServiceBusError to see if there are any errors for which they might want to stop recieving altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some good comments in one sample: https://github.com/Azure/azure-sdk-for-js/blob/83d3df06fd47c31a244b6b3c7e9de9e4c63c4f7e/sdk/servicebus/service-bus/samples-dev/receiveMessagesStreaming.ts
I will adopt them here, and add link to the errors.
Co-authored-by: Ramya Rao <[email protected]>
and link to ref doc on service bus errors.
This pull request is protected by Check Enforcer. What is Check Enforcer?Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass. Why am I getting this message?You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged. What should I do now?If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows: What if I am onboarding a new service?Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment: |
@@ -524,10 +537,6 @@ export class StreamingReceiver extends MessageReceiver { | |||
*/ | |||
private async _subscribeImpl(caller: "detach" | "subscribe"): Promise<void> { | |||
try { | |||
// this allows external callers (ie: ServiceBusReceiver) to prevent concurrent `subscribe` calls | |||
// by not starting new receiving options while this one has started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we retain this comment for resume()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Re-attached it to the resume() call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reads strange. Is this for the whole _subscribeImpl() method instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, am not sure what part of the code here is preventing concurrent subscribe calls...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chat with @richardpark-msft offline. We agreed that the comments can be removed
Also fix linting issue. The linter insists "@" in ts-doc should be escaped. The url still works after adding "\"
when subscribing to a valid topic but an invalid or disabled subscription. Issue #18798
The problem is that in these cases, the service accepts our client's cbs claim negotiation, and the receiver link is
created. The client receiver enters
the
retryForeverFn
to call_initAndAddCreditOperation()
, where the followingwill return successfully, despite that the fired-and-forgot
addCredit()
call would later leads toMessagingEntityDisabled
orMessagingEntityNotFound
errors in the underlying link. Since there's no errors thrown inour retry-forever loop, the
onErrors()
callbackis not invoked. It is one place where the user error handler is called.
Because of the error, the link is detatched. We have code to re-establish the link when errors happen, so we call
_subscribeImpl()
where theretryForeverFn()
is called again. This goes on in an endless loop.We used to invoke user error handler and would not attempt to re-establish connections when errors are considered
non-retryable. In PR #11973 we removed the classification of errors that
subscribe()
used and instead continues toretry infinitely. We also removed the code to invoke user error handler. This PR adds code to invoke the user error
handler in
_onAmqpError()
so users have a chance to stop the infinite loop.There's another problem. When users call
close()
on the subscription in their error handler,this._receiverHelper.suspend()
is called to suspend the receiver. However, when re-connecting we callthis._receiverHelper.resume()
again in_subscribeImpl()
. This essentially reset the receiver to be active and wewill not abort the attempt to initialize connection
azure-sdk-for-js/sdk/servicebus/service-bus/src/core/streamingReceiver.ts
Lines 574 to 579 in 1409903
To fix it, this PR moves the
resume()
call out. It is supposed to only called to enable receiver beforesubscribing. It should not be called when we try to re-connect.