-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"duplicate durable registration" error after "subscribe request timeout" #1135
Comments
@etrochim You are correct. As of now, I can't think of any other workaround. You clearly understand what the issue is and how to remedy to it, for now. |
When the subscription request for a durable subscription times out or fail on the client side, but it was accepted in the server, then if the application tries to restart the subscription request again it will fail with a "duplicate durable subscription" error until the connection is closed. This new option allows the user to decide how the server should behave when processing a duplicate durable subscription. If disabled, the default, it behaves as described above, that is, it will reject the second subscription request and return the "duplicate durable" error. If enabled, if the server detects that this is a duplicate, it will close the active one and accept the new one. It is a suspend followed by a resume. From the client perspective, if this is done in the context of #1135, then everything works well since the original subscription in the client was actually not started due to subscription request failure. However, if user try to create multiple duplicate durable subscriptions for subscription requests (Subscribe() calls) that did not fail, then their application will not be notified that the subscriptions that are being replaced are replaced, but they will simply stop receiving messages on those. Resolves #1135 Signed-off-by: Ivan Kozlovic <[email protected]>
I've been testing how my stan clients, written using nats.c, behave in the face of network problems introducing high packet loss (which, unfortunately, happens on rare occasions between two of our geographically dispersed sites). During my testing I've found that on some occasions, when resubscribing after a connection failure, the subscription request will fail with "subscribe request timeout" then, after retrying, will fail with "duplicate durable registration". All subsequent resubscribe attempts will fail with that message for the duration of the connection.
The reconnect logic my stan clients use is pretty straight forward and believe matches the recommendation:
I'm simulating an unreliable network by using Linux's NetEm. I've configured the network interface to rate limit to 150kbit and 40% packet loss (unrealistically high but appears to be quite good at hitting many possible failure cases when running for hours/days at a time). Most of the time the re-connection logic works correctly, except, of course, for the originally stated problem.
I haven't confirmed this but it appears as if the SubscriptionRequest message gets to the server but the SubscriptionResponse message doesn't get back to the client before the ConnectionWait period elapses causing a situation where the server thinks the client has a valid subscription but the client disagrees. This traps the client in a state where the subscription request can never be fulfilled until the connection stops and starts again.
Out of curiosity I tested the same scenario with nats-replicator. It too will get trapped in a "duplicate durable registration" error loop until the connection dies again.
Setting the ConnectionWait time to something higher (I've been using 10 seconds) significantly reduces the occurrence of the problem but does not entirely eliminate it.
The only work around I've been able to find is to simply destroy the connection entirely and reconnect when this error is encountered. However, that seems like a fairly drastic solution. Is there a better workaround for this?
The text was updated successfully, but these errors were encountered: